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Remarks 

The Rejection of Claims 8-10 Under 35 U.S.C. $ 101 

The Final Office Action maintains the rejection of claims 8-10 iinder 35 U.S.C. § 101 as 
not having an apparent or disclosed specific, substantial, and credible utility. Applicant 
respectfully traverses the rejection. 

The utility requirement for a claimed invention is met if a person of ordinary skill in the 
art would immediately appreciate why the invention is useful based on the characteristics of the 
invention and if the utility is specific, substantial, and credible. Manual of Patent Examining 
Procedure (M.P.E.P.) § 2107(11), 8th ed. The asserted utility of a claimed invention is credible 
imless (A) the logic xmderlying the assertion is seriously flawed or (B) the facts upon which the 
assertion is based are inconsistent with the logic xmderlying the assertion. M.P.E.P. 
§ 2107.02(III)(B). 

The rejection is based on the U.S. Patent and Trademark Office's doubt that Applicant's 
in silico identification of the claimed polypeptide as a lipoxin A4 receptor. The application 
teaches that the claimed polypeptide is a lipoxin A4 receptor. In the response filed September 8, 
2003, Applicant pointed out that the specification discloses that the polypeptidie of SEQ ID N0:2 
comprises seven transmembrane domains, which is characteristic of G protein-coupled receptors 
(GPCR); lipoxin A4 receptors are GPCRs. Applicant also pointed to three exhibits attached to 
the September 8 response: 

• Exhibit A demonstrates that the amino acid sequence of SEQ ID N0:2 ("Query") 
aligns with the amino acid sequence of Gorilla low affinity N-formyl peptide 
receptor ("Subject") with 27% identity and 47% homology. 
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• Exhibit B demonstrates that Hpoxin A4 and N-formyl peptide receptors are known 
to be very closely related (see Exhibit B attached to the response filed September 
8, 2003, at the receptors designated FML1_HUMAN:FPRL1, lines 2-3; 
FML1_M0USE:FPRLI, lines 5-6; and 088536, line 25); and 

• Exhibit C demonstrates that N-formyl peptide receptors bind lipoxin (Vaughn et 
aL J. Immunol (2002) 759:3363-3369, lines 19-20 of the Abstract) 

The Final Office Action dismisses this evidence, disparaging the idea that one skilled in the art 
could conclude that proteins showing 27% structural identity to a protein could have a similar 
fiinction. GPCRs, however, are highly variable. See Frederiksson et al,, "The G-Protein- 
Coupled Receptors in the Human Genome Form Five Main Families. Phylogenetic Analysis, 
Paralogon Groups, and Fingerprints," Mol Pharmacol 63, 1256-72, 2003 ("Frederiksson"; 
Attachment 5) at page 1256, column 1. 

There are many examples in known GPCR families of members that have about or less 
than 27% sequence identity. For example, in the 5-hydroxytryptamine (serotonin) receptor 
family, receptor 2B has only 28% sequence identity with receptor ID. See Attachment 1. Yet, 
both receptors bind serotonin. 

The y-aminobutyric acid (GABA) receptor family provides another example. Hximan 
GABA A receptor delta has only 28% sequence identity to the rat GABA-A e subunit and 27% 
sequence identity to C elegans ionotropic GABA receptor subunit UNC-49Cshort. See 
Attachments 2 and 3. Yet all three are GABA receptors. 

A survey of minimum similarities of known human GPCRs for the same ligand reveals 
minimxim percent identities of as low as 20.44%; the results are provided as Attachment 4. 
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Finally, Fredriksson points out that, in fact, some GPCR family members share less than 
24% sequence identity: "The frizzled family consists of 10 frizzled receptors, FZDl-10, together 
with SMOH, which is the most divergent receptor of the family, sharing only 24% identity with 
FZD2 and less with the others." Frederiksson at page 1261, colunrn 2, first fiiU paragraph. 

Those of skill in the art are well aware of how variable are GPCRs. Those skilled in the 
art certainly would not dismiss evidence identifying Applicant's claimed polypeptide as a lipoxin 
A4 receptor merely because absolute proof was not yet available. Moreover, the identity of the 
disclosed polypeptide immediately makes recognizable its utility. As taught in the specification: 
"[rjegulators of lipoxin A4 receptor-like polypeptide can be used to control hemostasis, vascular 
reactivity, especially vasoconstriction, and anaphylactic and allergic reactions." Page 8, lines 15- 
17. Thus, it cannot be asserted that the logic underlying the assertion of the claimed 
polypeptides' identity or utility is flawed or that the facts underlying the assertion of the claimed 
polypeptides' utility are inconsistent with the logic xmderlying the asserted utility. 

Applicant respectfiilly requests withdrawal of this rejection. 

The Rejection of Claims 8-10 Under 35 U.S.C. $ 1 12 

Claims 8-10 have been rejected under 35 U.S.C. § 112 as not enabled. AppUcant 
respectfiilly traverses the rejection. 

The Office Action asserts that "since the claimed invention is not supported by either a 
clear asserted utility or a well established utility . . . one skilled in the art would not know how to 
use the claimed invention." Paper 11, page 6, lines 1-4. In response to the rejection under 35 
U.S.C. § 101, Applicant has demonstrated that the claimed polypeptides have utility, mooting the 
asserted basis of the rejection. 
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Applicant respectfully requests withdrawal of this rejection. 

The Rejection of Claims 8 and 9 Under 35 U.S.C. § 102(a) 

Claims 8 and 9 have been rejected under 35 U.S.C. § 102(a) as anticipated by 
Elshourbagy et al, WO 00/26339 ("Elshourbagy"). Applicant respectfully traverses the 
rejection. 

United States Code Title 35, section 102 states that: 

A person shall be entitled to a patent unless — 

(a) the invention was known or used by others in this 
country, or patented or described in a printed publication in this or 
a foreign country, before the invention thereof by the applicant for 
patent. 

Elshourbagy is cited as teaching an amino acid sequence that is 100% identical to SEQ 
ID N0:2. Paper 11, page 6, lines 14-15. Elshoxirbagy, however, is not prior art to the present 
apphcation. Elshourbagy was published on May 11, 2000. The priority date of the present 
appUcation is March 14, 2000. The Patent Office asserts that the present application is not 
entitled to its priority date of March 14, 2000 because the claimed polypeptides lack utility and 
therefore are not enabled. However, as demonstrated above, the claimed invention has utility. 
The application is entitled to its priority date of March 14, 2000. Thus, Elshourbagy does not 
anticipate claims 8 and 9. 

Applicant respectfully requests withdrawal of this rejection. 
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The Rejection of Claim 10 Under 35 U.S.C. § 103(a) 

Claim 10 stands rejected under 35 U.S.C. § 103(a) as obvious over Elshourbagy in view 
of Hopp et al, U.S. Patent 5,011,912 ("Hopp"). Applicant respectfully traverses the rejection. 
As demonstrated above, Elshourbagy is not prior art to the present application. Thus, 
Elshourbagy cannot be used as a reference under 35 U.S.C. § 103(a). 

Applicant respectfully requests withdrawal of this rejection. 



Respectfully submitted, 
BANNER & WITCOFF, LTD. 





Lisa M. Hemmendinger 
Registration No. 42,653 



Customer No. 22907 
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Attachment 1 



Query: 5-hydroxytryptamine (serotonin) receptor 2B 

>swall I P28221 1 5H1D_HUMAN [DE : 5 -hydroxy trypt amine ID receptor 
(5-HT-lD) (Serotonin receptor) (5-HT-lD-alpha) . ] 
[OSrHomo sapiens (Human)] [OC : Eukaryota; Metazoa; 
Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; 
Eutheria; Primates; Catarrhini; Hominidae; Homo] 
Lengtti = 377 

Score = 160 bits (404) , Expect = 7e-38 

Identities = 95/339 (28%), Positives = 172/339 (50%), Gaps = 10/339 (2%) 



Query: 


58 


ALLILMVIIPTIGGNTLVILAVSLEKKLQYATNYFLMSLAVADLLVGLFVMPIALLTIMF 


117 






A+++ ++ + T+ N V+ + L +KL NY + SLA DLLV + VMPI++ + 




Sbjct : 


42 


AWLSVITLATVLSNAFVLTTILLTRKLHTPANYLIGSLATTDLLVSILVMPISIAYTIT 


101 


Query : 


118 


EAMWPLPLVLCPAWLFLDVLFSTASIMHLCAISVDRYIAIKKPIQANQYNSRATAFIKIT 


177 






W +LC WL D+ TASI+HLC I++DRY AI ++ ++ + A I 




Sbjct: 


102 


HT-WNFGQILCDIWLSSDITCCTASILHLCVIALDRYWAITDALEYSKRRTAGHAATMIA 


160 


Query: 


178 


VVWLISIGIAIPVPIKGIETDVDNPNNITCVLTKERFGDFMLFGSLAAFFTPLAIMIVTY 


237 






+VW ISI I+IP P++ +VT+ ++++ AF+ P ++I+ Y 




Sbjct: 


161 


I VWAI S IC I S I P- PLFWRQAKAQEEMSDCLVNTSQI - - SYTI YSTCGAFYI PSVLLI ILY 


217 


Query : 


238 


FLTIHALQKKAYLVKNKPPQRLTWLTVSTWQRDETPCSSPEKVAMLDGSRKDKALPNSG 


297 






A+++NP T++ +++G P 




Sbjct: 


218 


GRIYRAARNR ILNPPSLYGKRFTTAHLITGSAGSSLCSLNSSLHEGHSHSAGSPLFF 


274 


Query: 


298 


DETLMRRTSTIGKKSVQTISNEQRASKVLGIVFFLFLLMWCPFFITNITLVLC-DSCNQT 


356 






+ ++ + ++ + + E++A+K+LGI+ F++ W PFF+ ++ L +C DSC 




Sbjct: 


275 


NHVKIKLADSALERKRISAARERKATKILGIILGAFIICWLPFFWSLVLPICRDSC--W 


332 


Query : 


357 


TLQMLLEIFVWIGYVSSGVNPLVYTLFNKTFRDAFGRYI 395 








L + F W+GY++S +NP++YT+FN+ FR AF + + 




Sbjct: 


333 


IHPALFDFFTWLGYLNSLINPI I YTVFNEEFRQAFQKIV 371 
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Attachment 2 



Query: gamma -aminobutyric acid (GABA) A receptor, delta 

>swall|Q9EQF0|Q9EQF0 [DE:GABA-A epsilon subunit splice variant 

(Fragment).] [OSiRattus norvegicus (Rat)] [OC : Eukaryota; 
Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Rodentia; Sciurognathi; Muridae; 
Murinae; Rattus] 
Length = 388 



Score = 171 bits (432), Expect = 4e-41 

Identities = 118/411 (28%), Positives = 186/411 (45%), Gaps = 57/411 (13%) 



P V + + V S+ IS +MEY++ + +Q+W D RL YN T ETL L V +LW+P 
PTVWVKVFVNSLGPISILDMEYSIDIIFYQTWYDERLRYNDTFETLILHGNWSQLWIP 62 



DTF N+K +D+T+ N++ + DG +LY++R+T C + + +PMD C L 



S+ Y + + +Y W +1 + +L +F T + TE+ + + G F ++ F + 



Query: 


63 


Sbjct : 


3 


Query : 


123 


Sbjct: 


63 


Query : 


183 


Sbjct: 


123 


Query : 


243 


Sbjct: 


182 


Query : 


303 


Sbjct: 


222 


Query : 


342 


Sbjct: 


276 


Query: 


398 


Sbjct: 


335 



+ R G + Q+Y+PS + +SWVSFWI A AR S + 



<:ALDVYFWIC YVFVF AALVEYAFAHF NADYRKKQ 341 

ALD Y IC+V F L+E+ +F A+ R + 

- ALDFYI AICFVLCFCTLLEFTVLNFLTYNNIERQASPKFYQFPTNSRANARTRA 275 



+A+ + +R R + +V IV + +A Q SRR + SY V 



+K G +G R R I +D Y+R +FP F NV+YW 
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Attachment 3 

i 



Query; gamma- ami nobutyric acid (GABA) A receptor, delta 

>swall|Q9U9U3 IQ9U9U3 [DE: lonotropic GABA receptor subunit 
UNC-49Cshort . ] [OS iCaenorhabditis elegans] 
[OC:Eukaryota; Metazoa; Nematoda; Chromadorea; 
Rhabditida; Rhabditoidea; Rhabditidae; Peloderinae; 
C aeno r habd i t i s ] 
Length = 264 



Score = 108 bits (271), Expect = 2e-22 

Identities = 74/272 (27%), Positives = 116/272 (42%), Gaps = 45/272 (16%) 



Query : 


184 


SYGYSSEDIVYYWSESQEHIHGLDKLQLAQFTITSYHFTT ELMNFKSAGQFPRLSL 


239 






S YS+ +1 Y W S+E A ++SY FT + S+G + RL + 




Sbjct : 


3 


SDAYSTAEIEYKWCTSKEPNCSTAVKADANIELSSYKFTKICQKRTLASTSSGTYSRLRV 


62 


Query : 


240 


HFHLRRNRGVYI IQSYMPSVLLVAMSWSFWI SQAAVPARVSLGITTVLTMTTLMVSARS 


299 






F R+ G Y +Q + P+ L+V +SW+SFWI++ + P+R +G TVLT T LM 




Sbjct: 


63 


SFIFDRDSGFYFLQIFFPASLVWLSWISFWINRDSAPSRTLIGTMTVLTETHLMTGTNR 


122 


Query: 


300 


SLPRASAIKALDVYFWICYVFVFAALVEYAFAHF NADYRKKQKAKVKVSRPRAEMD 


355 






LP + +KA+DV+ CY+ V AL+EYA + N D R+++K P 




Sbjct: 


123 


RLPPVAYVKAVDVFLGFCYLLVILALIEYACVAYSKKKNEDRRRREKKTEHKPAPPTPDI 


182 


Query: 


356 


VRNAIVLFSLSAAGVTQELAISRRQRRVPGNLMGSYRSVGVETGETKKEGAARSGGQGGI 


415 






+ + + AT +A+ ++ R 




Sbjct: 


183 


LHDVRLAECTCNAAPTS 1 1 AVI KQSNRF 


210 


Query : 


416 


RARLRPIDADTIDIYARAVFPAAFAAVNVIYW 447 








+ IDI +RA FP F N ++W 




Sbjct: 


211 


CVSHSHIDIVSRAAFPLVFILFNTLFW 237 





9 



Minimum 
Minimum 
Minimum 
Minimum 
Minimum 
Minim\im 
Minimiim 
Minimum 
Minimum 
Minimum 
Minimum 
Minimum 
Minimum 
Minimum 
Minimum 
Minimum 
Minimum 
Minimum 
Minimum 
Minimum 
Minimum 
Minimum 
Minimum 
Minimum 
Minimum 
Minimum 
Minimum 
Minimum 
Minimum 
Minimum 
Minimiun 
Minimum 
Minimum 
Minimum 
Minimum 
Minimum 
Minimum 
Minimum 



Identity 
Identity 
Identity 
Identity 
Identity 
Identity 
Identity 
Identity 
Identity 
Identity 
Identity 
Identity 
Identity 
Identity 
Identity 
Identity 
Identity 
Identity 
Identity 
Identity 
Identity 
Identity 
Identity 
Identity 
Identity 
Identity 
Identity 
Identity 
Identity 
Identity 
Identity 
Identity 
Identity 
Identity 
Identity 
Identity 
Identity 
Identity 



between 


GPCRs 


with 


between 


GPCRs 


with 


between 


GPCRs 


with 


between 


GPCRs 


with 


between 


GPCRs 


with 


between 


GPCRs 


with 


between 


GPCRs 


with 


between 


GPCRs 


with 


between 


GPCRs 


with 


between 


GPCRs 


with 


between 


GPCRs 


with 


between 


GPCRs 


with 


between 


GPCRs 


with 


between 


GPCRs 


with 


between 


GPCRs 


with 


between 


GPCRs 


with 


between 


GPCRs 


with 


between 


GPCRs 


with 


between 


GPCRs 


with 


between 


GPCRs 


with 


between 


GPCRs 


with 


between 


GPCRs 


with 


between 


GPCRs 


with 


between 


GPCRs 


with 


between 


GPCRs 


with 


between 


GPCRs 


with 


between 


GPCRs 


with 


between 


GPCRs 


with 


between 


GPCRs 


with 


between 


GPCRs 


with 


between 


GPCRs 


with 


between 


GPCRs 


with 


L. W vi; d I 




W X L.11 


between 


GPCRs 


with 


between 


GPCRs 


with 


between 


GPCRs 


with 


between 


GPCRs 


with 


between 


GPCRs 


with 


between 


GPCRs 


with 



Attachment 4 

ligand melatonin: 59.14 % 

ligand interleukin 8: 77.43 % 

ligand NPFF: 48.84 % 

ligand GABA: 31.35 % 

ligand nicotinic acid: 95.32 % 

ligand Sphingosine 1-phosphate: 56.89 % 

ligand Angiotensin II: 34.54 % 

ligand 2-MeSADP: 25.23 % 

ligand sphingosine- 1-phosphate: 39.09 % 

ligand Serotonin: 27.97 % 

ligand CGRP: 20.44 % 

ligand prostaglandin E2 : 29.10 % 

ligand adenosine: 38.99 % 

ligand cannabinoids : 44.17 % 

ligand bradykinin: 36.83 % 

ligand dopamine: 28.64 % 

ligand prokinecticin: 86.98 % 

ligand adrenaline: 32.62 % 

ligand ET-1: 58.78 % 

ligand NPY: 29.87 % 

ligand MCP-1: 32,57 % 

ligand acetylcholine: 47.61 % 

ligand MSH: 45.11 % 

ligand parathyroid hormone: 51.64 % 

ligand neuromedin K: 80.68 % 

ligand glutamate: 40.87 % 

ligand alpha-methylnoradrenaline : 53.11 % 

ligand arginine vasopressin: 45.82 % 

ligand neuromedin U: 47.39 % 

ligand H2 relaxin: 52.92 % 

ligand GnRH: 34.93 % 

ligand galanin: 36.68 % 

ligand histamine: 26.46 % 

ligand progestins: 30.91 % 

ligand LTB4 : 44.03 % 

ligand activated thrombin: 33.42 % 

ligand somatostatin-14 : 45.27 % 

ligand neurotensin: 38.54 % 



Ligand: melatonin Number of Entries in DB 2 

P48039 with P49286 = 1401.00 hom = 59.14% len = [ 350 350 362 ] 

Minimum Identity between GPCRs with ligand melatonin: 59.14 % 
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Ligand: interleukin 8 Number of Entries in DB 2 

P25024 with P25025 = 1771.00 horn = 77.43% len = [ 355 350 360 ] 

Minimum Identity between GPCRs with ligand interleukin 8: 77.43 % 
Ligand: NPFF Number of Entries in DB 2 

Q9GZQ6 with Q9Y5X5 = 1372.00 hom = 48.84% len = [ 423 430 522 ] 

Minimum Identity between GPCRs with ligand NPFF: 48.84 % 

Ligand: GABA Number of Entries in DB 2 

075899 with Q9UBS5 = 1653.00 hom = 31.35% len = [ 892 941 961 ] 

Minimum Identity between GPCRs with ligand GABA: 31.35 % 
Ligand: nicotinic acid Number of Entries in DB 2 

BAB89345 with P49019 = 2377.00 hom = 95.32% len = [ 363 363 387 ] 

Minimum Identity between GPCRs with ligand nicotinic acid: 95.32 % 

Ligand: Sphingosine 1-phosphate Number of Entries in DB 3 

P46089 with P47775 = 1269.00 hom = 57.58% len = [ 333 330 334 ] 

P46089 with P46095 = 1296.00 hom = 59.70% len = [ 333 330 362 ] 

P46095 with P47775 = 1244.00 hom = 56.89% len = [ 336 362 334 ] 

Minimum Identity between GPCRs with ligand Sphingosine 1-phosphate: 56.89 % 

Ligand: Angiotensin II Number of Entries in DB 2 

P30556 with P50052 = 757.00 hom = 34.54% len = [ 355 359 363 ] 

Minimum Identity between GPCRs with ligand Angiotensin II: 34.54 % 

Ligand: 2-MeSADP Number of Entries in DB 2 

P47900 with Q9BPV8 = 414.00 hom = 25.23% len = [ 342 373 333 ] 

Minimum Identity between GPCRs with ligand 2-MeSADP: 25.23 % 

Ligand: sphingosine- 1-phosphate Number of Entries in DB 5 



095136 


with 


Q9H228 = 


955, 


.00 


hom = 


47 


.88% 


len = 


[ 


394 


353 


398 


] 


095136 


with 


Q99500 = 


1010, 


.00 


hom = 


47 


.03% 


len = 


t 


375 


353 


378 


] 


095136 


with 


095977 = 


680, 


.00 


hom = 


39 


.09% 


len = 


t 


389 


353 


384 


] 


095136 


with 


P21453 = 


1086, 


.00 


hom = 


48 


.16% 


len = 


[ 


370 


353 


382 


] 


Q99500 


with 


Q9H228 = 


1008, 


.00 


hom = 


43 


.12% 


len = 


[ 


390 


378 


398 


] 



11 



095977 


with 


Q9H228 = 


843 


00 


horn = 


44 


.01% 


len = [ 


396 


384 


398 


] 


P21453 


with 


Q9H228 = 


1078 


00 


horn = 


47 


.12% 


len = [ 


401 


382 


398 


] 


095977 


with 


Q99500 = 


868 


00 


hom = 


41 


.80% 


len = [ 


385 


384 


378 


] 


P21453 


with 


Q99500 = 


1286 


00 


horn = 


52 


.91% 


len = [ 


387 


382 


378 


] 


095977 


with 


P21453 = 


841 


00 


hom = 


39 


.53% 


len = [ 


398 


384 


382 


] 



Minimxjin Identity between GPCRs with ligand sphingosine-l-phosphate: 39.09 % 
Ligand: Serotonin Number of Entries in DB 12 



P08908 


with 


P28335 


— 


643 


00 


hom 


— 


29 


.22% 


len 


= [ 


438 


421 


458 


P08908 


with 


P28223 


— 


604 


00 


hom 


- 


28 


.03% 


len 


■~ \ 


444 


421 


471 


P08908 


with 


P28222 




1016 


00 


hom 


— 


43 


.33% 


len 


= [ 


440 


421 


390 


P08908 


with 


P28221 


— 


977 


00 


hom 


— 


43 


.24% 


len 


= [ 


430 


421 


377 


P08908 


with 


P28566 


— 


858 


00 


hom 


— 


42 


.19% 


len 


= [ 


419 


421 


365 


P08908 


with 


Q13639 


= 


627 


00 


hom 


— 


34 


.79% 


len 


— f 


436 


421 


388 


P08908 


with 


P50406 


- 


596 


00 


hom 


= 


29 


.93% 


len 


= [ 


437 


421 


440 


P08908 


with 


P47898 


— 


763 


00 


hom 




38 


.66% 


len 


= [ 


431 


421 


357 


P08908 


with 


P41595 


— 


627 


00 


hom 


- 


32 


.54% 


len 


= [ 


434 


421 


481 


P08908 


with 


P34969 




806 


00 


hom 


- 


34 


.44% 


len 


- [ 


447 


421 


479 


P08908 


with 


P30939 




923 


00 


hom 




44 


.54% 


len 


- [ 


425 


421 


366 


P28223 


with 


P28335 




1414 


00 


hom 


= 


51 


.75% 


len 


= [ 


484 


471 


458 


P28222 


with 


P28335 




617 


00 


hom 


- 


32 


.05% 


len 


= [ 


415 


390 


458 


P28221 


with 


P28335 


- 


644 


00 


hom 


- 


32 


.36% 


len 


— r 


411 


377 


458 


P28335 


with 


P28566 


= 


648 


00 


hom 


— 


33 


.97% 


len 


— r 


379 


458 


365 


P28335 


with 


Q13639 


— 


615 


00 


hom 




35 


.31% 


len 


= [ 


447 


458 


388 


P28335 


with 


P50406 




617 


00 


hom 


- 


28 


.86% 


len 


^ [ 


439 


458 


440 


P28335 


with 


P47898 


— 


546 


00 


hom 




32 


.21% 


len 


_ r 


382 


458 


357 


P28335 


with 


P41595 


— 


1229 


00 


hom 


— 


45 


.85% 


len 


_ r 


500 


458 


481 


P28335 


with 


P34969 


— 


587 


00 


hom 


— 


30 


.35% 


len 


= r 


488 


458 


479 


P28335 


with 


P30939 


— 


676 


00 


hom 


— 


34 


.43% 


len 


= r 


387 


458 


366 


P28222 


with 


P28223 


_ 


619 


00 


hom 


_ 


31 


.54% 


len 


_ r 


400 


390 


471 


P28221 


with 


P28223 


— 


620 


00 


hom 


_ 


30 


.77% 


len 


— r 


397 


377 


471 


P28223 


with 


P28566 


= 


669 


00 


hom 


— 


34 


.52% 


len 


= r 


387 


471 


365 


P28223 


with 


Q13639 




560 


00 


hom 


- 


34 


.54% 


len 


= [ 


448 


471 


388 


P28223 


with 


P50406 


— 


609 


00 


hom 


- 


30 


.00% 


len 


= [ 


446 


471 


440 


P28223 


with 


P47898 




556 


00 


hom 




30 


.81% 


len 




375 


471 


357 


P28223 


with 


P41595 




1226 


00 


hom 




45 


.65% 


len 




509 


471 


481 


P28223 


with 


P34969 




638 


00 


hom 




28 


.87% 


len 




506 


471 


479 


P28223 


with 


P30939 




626 


00 


hom 




31 


.15% 


len 




373 


471 


366 


P28221 


with 


P28222 




1563 


00 


hom 




62 


.86% 


len 




382 


377 


390 


P28222 


with 


P28566 




1071 


00 


hom 




48 


.77% 


len 




381 


390 


365 


P28222 


with 


Q13639 




567 


00 


hom 




32 


.22% 


len 




384 


390 


388 


P28222 


with 


P50406 




602 


00 


hom 




31 


.28% 


len 




406 


390 


440 


P28222 


with 


P47898 




743 


00 


hom 




38 


.66% 


len 




386 


390 


357 


P28222 


with 


P41595 




582 


.00 


hom 




29 


.49% 


len 




414 


390 


481 


P28222 


with 


P34969 




780 


00 


hom 




37 


.69% 


len 




409 


390 


479 


P28222 


with 


P30939 




1154 


00 


hom 




48 


.91% 


len 




374 


390 


366 


P28221 


with 


P28566 




1096 


00 


hom 




49 


.59% 


len 




376 


377 


365 


P28221 


with 


Q13639 




568 


00 


hom 




31 


.83% 


len 




383 


377 


388 


P28221 


with 


P50406 




610 


00 


hom 




30 


.77% 


len 




383 


377 


440 


P28221 


with 


P47898 




749 


00 


hom 




37 


.54% 


len 




391 


377 


357 



12 



Ligand 



Ligand 



P28221 


with 


P41595 




570 


.00 


horn 




28 


.91% 


len 




397 


377 


481 


P28221 


with 


P34969 




780 


.00 


horn 




38 


.20% 


len 


^ [ 


407 


377 


479 


P28221 


with 


P30939 


_ 


1100 


.00 


horn 


_ 


50 


.55% 


len 


^ [ 


396 


377 


366 


P28566 


with 


Q13639 


- 


577 


.00 


horn 


_ 


32 


.33% 


len 


^ [ 


385 


365 


388 


P28566 


with 


P50406 


- 


568 


.00 


horn 


_ 


30 


.96% 


len 


^ [ 


381 


365 


440 


P28566 


with 


P47898 


- 


728 


.00 


hom 


_ 


35 


.85% 


len 


^ [ 


382 


365 


357 


P28566 


with 


P41595 


- 


570 


.00 


hom 




32 


.33% 


len 


^ [ 


387 


365 


481 


P28566 


with 


P34969 


— 


762 


.00 


hom 


_ 


38 


.36% 


len 


^ [ 


377 


365 


479 


P28566 


with 


P30939 


_ 


1330 


.00 


hom 


_ 


57 


.26% 


len 


^ [ 


372 


365 


366 


P50406 


with 


Q13639 


_ 


624 


.00 


hom 


_ 


34 


.28% 


len 




434 


440 


388 


P47898 


with 


Q13639 


— 


529 


.00 


hom 


_ 


29 


.13% 


len 




355 


357 


388 


P41595 


with 


Q13639 


— 


510 


.00 


hom 


_ 


32 


.47% 


len 


^ [ 


455 


481 


388 


P34969 


with 


Q13639 


- 


698 


.00 


hom 


- 


36 


.86% 


len 


= [ 


437 


479 


388 


P30939 


with 


Q13639 


— 


629 


.00 


hom 


- 


35 


.79% 


len 


= [ 


385 


366 


388 


P47898 


with 


P50406 


— 


579 


.00 


hom 




30 


.81% 


len 


^ [ 


361 


357 


440 


P41595 


with 


P50406 


— 


526 


.00 


hom 


_ 


31 


.14% 


len 


^ [ 


488 


481 


440 


P34969 


with 


P50406 


_ 


654 


.00 


hom 


_ 


30 


.91% 


len 




457 


479 


440 


P30939 


with 


P50406 


= 


571 


.00 


hom 




30 


.60% 


len 


= [ 


375 


366 


440 


P41595 


with 


P47898 




438 


.00 


hom 




29 


.13% 


len 




393 


481 


357 


P34969 


with 


P47898 




734 


.00 


hom 




38 


.38% 


len 




387 


479 


357 


P30939 


with 


P47898 




758 


.00 


hom 




36 


.97% 


len 




380 


366 


357 


P34969 


with 


P41595 




593 


.00 


hom 




27 


.97% 


len 




498 


479 


481 


P30939 


with 


P41595 




591 


.00 


hom 




30 


.60% 


len 




390 


366 


481 


P30939 


with 


P34969 




752 


.00 


hom 




36 


.89% 


len 




374 


366 


479 


Minimiim Identity between GPCRs 


with 


ligand Serotonin: 


27.97 % 








i: CGRP 






N\jmber 


of 


Entries in DB 


2 










P25106 


with 


Q16602 




96 


.00 


hom 




20 


.44% 


len 


= [ 


407 


362 


461 


Minimum Identity between GPCRs 


with 


ligand CGRP: 20.44 % 










[; prostaglandin E2 






Number of Entries 


in DB 4 










P34995 


with 


P35408 




497 


.00 


hom 




29 


.10% 


len 




404 


402 


488 


P34995 


with 


P43115 




610 


.00 


hom 




34 


.62% 


len 




432 


402 


390 


P34995 


with 


P43116 




493 


.00 


hom 




36 


.59% 


len 




420 


402 


358 


P35408 


with 


P43115 




503 


.00 


hom 




32 


.05% 


len 




409 


488 


390 


P35408 


with 


P43116 




664 


.00 


hom 




37 


.99% 


len 




398 


488 


358 


P43115 


with 


P43116 




506 


.00 


hom 




34 


.08% 


len 




401 


390 


358 



Minimum Identity between GPCRs with ligand prostaglandin E2 : 29.10 % 



13 



Ligand 



Ligahd 



I: adenosine • 




Niamber 


of 


Entries in DB 


4 












P29274 with 


P29275 - 


1277 


. 00 


horn = 59.64% 


len 




344 


412 


332 


] 


P29274 with 


P30542 = 


1004 


.00 


horn = 51.23% 


len 




363 


412 


326 


] 


P29274 with 


P33765 = 


823 


.00 


hom = 42.45% 


len 




348 


412 


318 


] 


P29275 with 


P30542 = 


894 


.00 


hom = 46.32% 


len 




336 


332 


326 


] 


P29275 with 


P33765 - 


771 


.00 


hom = 38.99% 


len 




329 


332 


318 


] 


P30542 with 


P33765 = 


1020 


.00 


hom = 49.37% 


len 




327 


326 


318 


] 


Minimiim Identity 


between GPCRs 


with 


ligand adenosine: 


38.99 % 










i: cannabinoids 




Number 


of 


Entries in DB 


2 












P21554 with 


P34972 = 


953 


.00 


hom = 44.17% 


len 


= [ 


388 


472 


360 


] 



Minimum Identity between GPCRs with ligand cannabinoids: 44.17 % 
Ligand: bradykinin Number of Entries in DB 2 

P30411 with P46663 = 766.00 hom = 36.83% len =: [ 362 391 353 ] 

Minimum Identity between GPCRs with ligand bradykinin: 36.83 % 

Ligand: dopamine Number of Entries in DB 5 



P14416 


with 


P21917 = 


874 


.00 


hom 




42 


00% 


len 




463 


443 


419 


] 


P14416 


with 


P35462 = 


1335 


,00 


hom 




59 


00% 


len 




468 


443 


400 


] 


P14416 


with 


P21918 = 


631 


.00 


hom 




31 


38% 


len 




486 


443 


477 


] 


P14416 


with 


P21728 = 


649 


.00 


hom 




30 


47% 


len 




457 


443 


446 


] 


P21917 


with 


P35462 = 


924 


.00 


hom 




42 


25% 


len 




436 


419 


400 


] 


P21917 


with 


P21918 = 


577 


.00 


hom 




33 


65% 


len 




468 


419 


477 


] 


P21728 


with 


P21917 = 


562 


.00 


hom 




28 


64% 


len 




449 


446 


419 


] 


P21918 


with 


P35462 = 


627 


.00 


hom 




33 


75% 


len 




447 


477 


400 


] 


P21728 


with 


P35462 = 


665 


.00 


hom 




32 


75% 


len 




419 


446 


400 


] 


P21728 


with 


P21918 = 


1681 


.00 


hom 




60 


09% 


len 




477 


446 


477 


] 



Minim\am Identity between GPCRs with ligand dopamine: 28.64 % 
Ligand: prokinecticin Number of Entries in DB 2 

AY089976 with LBRI028 = 2253.00 hom = 86.98% len = [ 384 393 384 ] 

Minimum Identity between GPCRs with ligand prokinecticin: 86.98 % 
Ligand: adrenaline Number of Entries in DB 5 



P07550 


with 


P08588 = 


1348 


00 


hom 




54 


.24% 


len 




457 


413 


477 


] 


P07550 


with 


P35348 - 


744 


00 


hom 




34 


.38% 


len 




451 


413 


466 


] 


P07550 


with 


P35368 - 


747 


00 


hom 




35 


.59% 


len 




452 


413 


520 


] 


P07550 


with 


P13945 = 


1056 


00 


hom 




43 


.87% 


len 




410 


413 


408 


] 


P08588 


with 


P35348 = 


760 


00 


hom 




32 


.62% 


len 




483 


477 


466 


] 


P08588 


with 


P35368 = 


795 


00 


hom 




36 


.48% 


len 




553 


477 


520 


] 


P08588 


with 


P13945 - 


1332 


00 


hom 




56 


.86% 


len 




459 


477 


408 


] 



14 



P35348 with 
P13945 with 
P13945 with 



P35368 = 1538.00 horn = 52.58% len [ 499 466 520 ] 
P35348 = 729.00 horn = 34.80% len = [ 415 408 466 ] 

P35368 = 770.00 horn = 37.99% len = [ 473 408 520 ] 



Minimum Identity between GPCRs with ligand adrenaline: 32.62 % 
Ligand: ET-1 Nijmber of Entries in DB 2 

P24530 with P25101 = 1619.00 horn = 58.78% len = [ 460 442 427 ] 

Minim\im Identity between GPCRs with ligand ET-1: 58.78 % 
Ligand: NPY Number of Entries in DB 4 



P25929 


with 


Q15761 = 


612 


00 


horn = 


30. 


99% 


len = [ 


448 


384 


445 


] 


P25929 


with 


P50391 = 


1119 


00 


hom = 


44. 


80% 


len = [ 


387 


384 


375 


] 


P25929 


with 


P49146 = 


598 


00 


horn = 


30. 


18% 


len = [ 


400 


384 


381 


] 


P50391 


with 


Q15761 = 


584 


00 


hom = 


29. 


87% 


len = [ 


455 


375 


445 


] 


P49146 


with 


Q15761 = 


560 


00 


hom = 


29. 


92% 


len = [ 


458 


381 


445 


] 


P49146 


with 


P50391 = 


614 


00 


hom = 


33. 


60% 


len = [ 


391 


381 


375 


] 



Minimum Identity between GPCRs with ligand NPY: 29.87 % 
Ligand: MCP-1 Number of Entries in DB 2 

P41597 with Q9NPB9 = 642.00 hom = 32.57% len = [ 362 374 350 ] 

Minimum Identity between GPCRs with ligand MCP-1: 32.57 % 
Ligand: acetylcholine Nximber of Entries in DB 5 



P08172 


with 


P20309 = 


1388 


00 


hom 




50 


.00% 


len 




544 


466 


590 


] 


P08172 


with 


P08173 = 


1778 


00 


hom 




60 


.94% 


len 




484 


466 


478 


] 


P08172 


with 


P11229 = 


1339 


00 


hom 




47 


.61% 


len 




485 


466 


460 


] 


P08172 


with 


P08912 - 


1343 


00 


hom 




48 


.50% 


len 




539 


466 


532 


] 


P08173 


with 


P20309 = 


1401 


00 


hom 




50 


.42% 


len 




553 


478 


590 


] 


P11229 


with 


P20309 = 


1677 


00 


hom 




60 


.65% 


len 




554 


460 


590 


] 


P08912 


with 


P20309 = 


1782 


00 


hom 




56 


.77% 


len 




576 


532 


590 


] 


P08173 


with 


P11229 = 


1357 


00 


hom 




49 


.78% 


len 




505 


478 


460 


] 


P08173 


with 


P08912 = 


1422 


00 


hom 




50 


.21% 


len 




543 


478 


532 


] 


P08912 


with 


P11229 = 


1635 


00 


hom 




58 


.91% 


len 




534 


532 


460 


] 



Minimum Identity between GPCRs with ligand acetylcholine: 47.61 % 
Ligand: MSH Number of Entries in DB 4 



P32245 


with 


P33032 = 


1323 


00 


hom = 


62 


.77% 


len = [ 


329 


332 


325 


] 


P32245 


with 


Q01726 = 


978 


00 


hom = 


49 


.21% 


len = [ 


325 


332 


317 


] 


P32245 


with 


P41968 = 


1271 


00 


hom = 


57 


.83% 


len = [ 


334 


332 


360 


] 


P33032 


with 


Q01726 = 


916 


00 


hom = 


45 


.11% 


len = [ 


321 


325 


317 


] 


P33032 


with 


P41968 = 


1332 


00 


hom = 


61 


.23% 


len = [ 


324 


325 


360 


] 


P41968 


with 


Q01726 = 


963 


00 


hom = 


48 


.90% 


len = [ 


338 


360 


317 


] 



15 



Minimiun Identity between GPCRs with ligand MSH: 45.11 % 
Ligand: parathyroid hormone Number of Entries in DB 2 

P49190 with Q03431 = 1755.00 horn = 51.64% len = [ 609 550 593 ] 

Minimum Identity between GPCRs with ligand parathyroid hormone: 51.64 % 
Ligand: neuromedin K Number of Entries in DB 2 

P29371 with P30098 = 2395.00 horn = 80.68% len = [ 444 465 440 ] 

Minimum Identity between GPCRs with ligand neuromedin K: 80.68 % 

Ligand: glutamate Number of Entries in DB 8 



000222 


with 


Q14416 




2469. 


00 


horn 




45. 


30% 


len 




[ 900 


908 


872 


000222 


with 


Q14832 


— 


2574. 


00 


horn 




46. 


86% 


len 


_ 


904 


908 


877 


000222 


with 


Q14833 


— 


4730. 


00 


horn 


= 


75. 


11% 


len 


_ 


912 


908 


912 


000222 


with 


Q14831 


- 


4655. 


00 


horn 




74. 


34% 


len 


_ 


915 


908 


915 


000222 


with 


Q13255 




2244. 


00 


hom 




42. 


73% 


len 




[ 966 


908 


1194 


000222 


with 


015303 




4197. 


00 


hom 




69. 


78% 


len 




[ 887 


908 


877 


000222 


with 


P41594 




2300. 


00 


hom 




42 . 


51% 


len 




[ 943 


908 


1212 


Q14416 


with 


Q14832 




4089. 


00 


hom 




67. 


78% 


len 




: 883 


872 


877 


Q14416 


with 


Q14833 




2513. 


00 


hom 




46. 


79% 


len 




915 


872 


912 


Q14416 


with 


Q14831 




2473. 


00 


hom 




45. 


41% 


len 




: 917 


872 


915 


Q13255 


with 


Q14416 




2406. 


00 


hom 




45. 


87% 


len 




953 


1194 


872 


015303 


with 


Q14416 




2508. 


00 


hom 




46. 


56% 


len 




892 


877 


872 


P41594 


with 


Q14416 




2353. 


00 


hom 




45. 


30% 


len 




934 


1212 


872 


Q14832 


with 


Q14833 




2537. 


00 


hom 




46. 


52% 


len 




914 


877 


912 


Q14831 


with 


Q14832 




2524. 


00 


hom 




45. 


50% 


len 




917 


915 


877 


Q13255 


with 


Q14832 




2351. 


00 


hom 




44. 


81% 


len 




952 


1194 


877 


015303 


with 


Q14832 




2519. 


00 


hom 




45. 


15% 


len 




884 


877 


877 


P41594 


with 


Q14832 




2397. 


00 


hom 




44. 


01% 


len 




905 


1212 


877 


Q14831 


with 


Q14833 




4419. 


00 


hom 




69. 


63% 


len 




916 


915 


912 


Q13255 


with 


Q14833 




2310. 


00 


hom 




42. 


87% 


len 




966 


1194 


912 


015303 


with 


Q14833 




4202. 


00 


hom 




70. 


13% 


len 




897 


877 


912 


P41594 


with 


Q14833 




2278. 


00 


hom 




41. 


12% 


len 




914 


1212 


912 


Q13255 


with 


Q14831 




2267. 


00 


hom 




41. 


75% 


len 




943 


1194 


915 


015303 


with 


Q14831 




4095. 


00 


hom 




66. 


93% 


len 




887 


877 


915 


P41594 


with 


Q14831 




2267. 


00 


hom 




40. 


87% 


len 




: 948 


1212 


915 


015303 


with 


Q13255 




2231. 


00 


hom 




41. 


73% 


len 




: 895 


877 


1194 


P41594 


with 


Q13255 




4800. 


00 


hom 




63. 


07% 


len 




: 1267 


1212 


1194 


015303 


with 


P41594 




2166. 


00 


hom 




40. 


94% 


len 




: 887 


877 


1212 



Minimum Identity between GPCRs with ligand glutamate: 40.87 % 



16 



Ligand: alpha-methylnoradrenaline Number of Entries in DB 2 

P18089 with P18825 = 1401.00 horn = 53.11% len = [ 473 450 461 ] 

Minimum Identity between GPCRs with ligand alpha-methylnoradrenaline: 53.11 % 

Ligand: arginine vasopressin Number of Entries in DB 2 

P30518 with P47901 = 967.00 hom = 45.82% len = [ 404 371 424 ] 

Minimum Identity between GPCRs with ligand arginine vasopressin: 45.82 % 

Ligand: neuromedin U Number of Entries in DB 2 

AAC02680 with AF292402 = 1188.00 hom = 47.39% len = [ 420 403 414 ] 

Minimum Identity between GPCRs with ligand neuromedin U: 47.39 % 



Ligand: H2 relaxin 
LBRI125 with 
Minimum Identity 

Ligand: GnRH 

NP_476504 with 
Minimum Identity 

Ligand: galanin 

043603 with 
043603 with 
060755 with 

Minimum Identity 

Ligand: histamine 

P25021 with 
P25021 with 
P25021 with 
P35367 with 
P35367 with 
Q9H3N8 with 

Minimum Identity 

Ligand : progestins 



Number of Entries in DB 2 
Q9HBX9 = 2684.00 hom = 52.92% len = [ 751 754 757 ] 
between GPCRs with ligand H2 relaxin: 52.92 % 
Number of Entries in DB 2 
P30968 = 715.00 hom = 34.93% len = [ 245 292 328 ] 
between GPCRs with ligand GnRH: 34.93 % 
Number of Entries in DB 3 

[ 



P47211 = 
060755 = 
P47211 = 



843 . 00 hom 
1282 . 00 hom 
777.00 hom 



39.83% len 
58.15% len = [ 
36.68% len = [ 



368 387 349 ] 
393 387 368 ] 
349 368 349 ] 



between GPCRs with ligand galanin: 36.68 % 
Number of Entries in DB 4 



P35367 = 
Q9Y5N1 = 
Q9H3N8 = 
Q9Y5N1 = 
Q9H3N8 = 
Q9Y5N1 = 



between GPCRs with ligand histamine: 26.46 % 
Number of Entries in DB 3 



459 


00 


hom 




31. 


20% 


len 




496 


359 


487 


] 


394 


00 


hom 




26. 


46% 


len 




437 


359 


445 


] 


398 


00 


hom 




27. 


02% 


len 




417 


359 


390 


] 


616 


00 


hom 




29. 


89% 


len 




496 


487 


445 


] 


505 


00 


hom 




29. 


49% 


len 




485 


487 


390 


] 


957 


00 


hom 




43. 


85% 


len 




444 


390 


445 


] 
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AA047233 with NP_060175 = 515.00 hom = 30.91% len = [ 339 346 330 ] 

AA047233 with Q8N6D3 = 1109.00 hom = 49.13% len = [ 355 346 354 ] 

NP_060175 with Q8N6D3 = 514.00 hom = 32.12% len = [ 354 330 354 ] 

Minimum Identity between GPCRs with ligand progestins: 30.91 % 
Ligand: LTB4 Number of Entries in DB 2 

Q15722 with Q9NPC1 = 803.00 hom = 44.03% len = [ 364 352 389 ] 

Minimum Identity between GPCRs with ligand LTB4 : 44.03 % 
Ligand: activated thrombin Number of Entries in DB 2 

AAC25699 with 000254 = 701.00 hom = 33.42% len = [ 373 385 374 ] 

Minimum Identity between GPCRs with ligand activated thrombin: 33.42 % 



Ligand: somatostatin- 14 Number of Entries in DB 4 



P30872 


with 


P32745 = 


1081. 


00 


hom = 


45 


27% 


len = [ 


402 


391 


418 


] 


P30872 


with 


P31391 = 


1467. 


00 


hom = 


60 


82% 


len = [ 


406 


391 


388 


] 


P30872 


with 


P30874 = 


1140. 


00 


hom = 


48 


24% 


len = [ 


380 


391 


369 


] 


P31391 


with 


P32745 = 


1091. 


00 


hom = 


47 


94% 


len = [ 


412 


388 


418 


] 


P30874 


with 


P32745 = 


1198. 


00 


hom = 


50 


14% 


len = [ 


390 


369 


418 


] 


P30874 


with 


P31391 = 


1095. 


00 


hom = 


46 


61% 


len - [ 


387 


369 


388 


] 



Minimum Identity between GPCRs with ligand somatostatin-14 : 45.27 % 
Ligand: neurotensin Number of Entries in DB 2 

095665 with P30989 = 1004.00 hom = 38.54% len = [ 443 410 418 ] 

Minimum Identity between GPCRs with ligand neurotensin: 38.54 % 



Minimum Identity between GPCRs with same ligand: 20.44 % 
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ABSTRACT 

The superfamily of G-protein-coupled receptors (GPCRs) is 
very diverse in structure and function and its members are 
among the most pursued targets for drug development. We 
identified more than 800 human GPCR sequences and simul- 
taneously analyzed 342 unique functional nonolfactory human 
GPCR sequences with phylogenetic analyses. Our results 
show, with high bootstrap support, five main families, named 
glutamate, rhodopsin, adhesion, frizzled/taste2, and secretin, 
forming the GRAFS classification system. The rhodopsin family 
is the largest and forms four main groups with 13 sub- 
branches. Positions of the GPCRs in chromosomal paralogons 



regions indicate the importance of tetraploidizations or local 
gene duplication events for their creation. We also searched for 
"fingerprint" motifs using Hidden Markov Models delineating 
the putative inter-relationship of the GRAFS families. We show 
several common structural features indicating that the human 
GPCRs in the GRAFS families share a common ancestor. This 
study represents the first overall map of the GPCRs in a single 
mammalian genome. Our novel approach of analyzing such 
large and diverse sequence sets may be useful for studies on 
GPCRs in other genomes and divergent protein families. 



The superfamily of G-protein-coupled receptors (GPCRs) is 
one of the largest families of proteins in the mammalian 
genome (Lander et aL, 2001; Venter et al., 2001). It has been 
estimated that more than half of all modem drugs are tar- 
geted at these receptors (Flower, 1999), and several ligands 
for GPCRs are found among the worldwide top-lOO-selHng 
pharmaceutical products. It is also evident that drugs have 
still only been developed to affect a very small number of the 
GPCRs, and the potential for drug discovery within this field 
is enormous. 

The Ugands for the GPCRs have tremendous variation; 
ions, organic odorants, amines, peptides, proteins, lipids, nu- 
cleotides, and even photons are able to mediate their message 
through these proteins. The GPCR proteins are also highly 
variable. There are two main requirements for a protein to be 
classified as a GPCR. The first requirement relates to seven 
sequence stretches of about 25 to 35 consecutive residues 
that show a relatively high degree of calculated hydrophobic- 
ity . These sequences are believed to represent seven a-helices 
that span the plasma membrane in an coimter-clockwise 
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manner, forming a receptor, or a recognition and connection 
imit, enabling an extracellular ligand to exert a specific effect 
into the ceU. The second principal requirement is the ability 
of the receptor to interact with a G-protein. There is a great 
diversity in the fimctional coupling of the GPCRs; they have 
a nimiber of alternative signaling pathways, interacting di- 
rectly with a nimiber of other proteins. Interaction with 
G-proteins has not been demonstrated for most GPCRs, in 
particular for those whose genes have just recently been 
sequenced. It may therefore be more technically correct to 
term this superfamily "seven transmembrane (TM) recep- 
tors", but the GPCR terminology is more established. 

Several classification systems have been used to sort out 
this superfamily. Some systems group the receptors by how 
their ligand binds, and others have used both physiological 
and structursd featmres. One of the most fi^equently used 
systems uses clans (or classes) A, B, C, D, E, and F, and 
subclans are assigned using romgm niunber nomenclatiire 
(Attwood and Findlay 1994; Kolakowski, 1994). This A-F 
system is designed to cover all GPCRs, in both vertebrates 
and invertebrates. Some families in the A-F system do not 
exist in humans. Examples of this are clans D and E, which 
represent fungal pheromone receptors and cAMP receptors, 
family IV in clan A, which is composed of invertebrate opsin 
receptors, and clan F, which contains archaebacterial opsins. 



ABBREVIATIONS: GPCR, G-protein-coupled receptor; TM, transmembrane; HMM, Hidden Markov Models. 
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The overall classification of the GPCRs has been hampered 
by the large sequence differences between mammalian and 
invertebrate GPCRs. The GPCRs in Drosophila melanogaster 
show in many cases little resemblance to those in mammals 
(Broeck, 2001). Certain species show also a high difference in 
the nimibers of receptor genes in different classes. Caeno- 
rhabditis elegans, a worm, has, for example, developed a 
remarkable number of chemosensory (olfactory) GPCRs re- 
lated to the creature's specific lifestyle. Those chemosensory 
receptors, as well as the olfactory receptors in D melano- 
gaster, do not show any clear resemblance to the olfactory 
receptors in himians. 

Gene duplication occurs both by individual duplication, 
which often leaves the new gene near the parent gene, and by 
block duplications involving chromosomal regions or entire 
chromosomes. Large-scale duplications, including 
polyploidizations, are believed to be an important mechanism 
of vertebrate evolution. Two roimds of large-scale dupHca- 
tions are thought to have occurred in early vertebrate ances- 
try (Limdin, 1993; Holland et al., 1994), resvdting in up to 
four copies of each gene in mammals, which originate fi-om a 
common gmcestor gene in a cephalochordate. It is now known 
as the "2R hypothesis" or the "one-to-four model". This has 
led to the construction of maps that contain paralogous chro- 
mosomal regions, or paralogons (Lundin, 1993; Holland et 
al., 1994; Katsanis et al., 1996; Popovici et al., 2001), in 
vertebrates, which in combination with phylogenetic analysis 
can provide valuable information on gene relationships and 
origins. 

In this study, we collected a large set of GPCR sequences in 
the human genome and performed multiple phylogenetic 
analyses. The first task was to compile a comprehensive data 
set with just a single copy of each gene. We wemted to avoid 
polymorphism, pseudogenes, duplicates (resulting from the 
same gene having multiple names), and other related prob- 
lems. We identified more than 800 GPCRs in databases and 
simultaneously analyzed sequences of 342 xinique functional 
nonolfactory human GPCRs and grouped them by phyloge- 
netic analysis. The chromosomal localization and positioning 
in paralogous groups of the genes were studied to give insight 
into the mechanism involved in creating the receptor genes. 
The different families were also analyzed for common se- 
quence motifs, and we discuss the evidence for common de- 
scent of the families. 



Materials and Methods 

Data Retrieval. Approximately 200 GPCRs, both orphans and 
characterized receptors, known fixtm the literature were downloaded 
fi-om the GenBank database using the Entrez data-retrieval tool 
(http:/Avww.ncbi.nlm.nih,gov/Entrez/). This data set was considered 
the start set, and all the genes were manually searched against the 
human genome database using BLASTP (Altschul et al., 1997) on the 
protein database. New receptors that were not already in the data 
set were saved and included. At least 20 of the most significant 
BLAST hits (sorted by E-Value), for each receptor, were checked to 
further extend the data set obtaining the first crude database. Du- 
plicates were removed fi-om this data set using a crude phylogenetic 
analysis. Thereafter, Entrez was used in keyword searches to iden- 
tify orphan receptors, which are usually named GPRnnn, where nnn 
is a number. In our case searches were made with nnn ranging firom 
1 to 150. 

To extend the data set, searches were made with all receptor 



sequences in the data set against the human genome protein data- 
base at NCBI. All genes were screened against the first version of the 
database to avoid duplicates. To identify possible novel receptors, not 
yet annotated in the human genome database at NCBI, we searched 
with a diverse set of GPCR receptors at the nucleotide level using 
BLASTX against the Genescan data set. A P value of 0.001 was used 
as a threshold or a maximum of 100 BLAST hits were analyzed for 
each search. 

The genes were named according to the convention used in the 
human genome database at NCBI, although several orphan GPCRs, 
which recently had their Hgands identified, were subsequently re- 
named according to recent literature. If no name was assigned to a 
specific sequence in the database, these were assigned GPR numbers 
as provided by the HUGO nomenclature committee. Sequences not 
present in the human genome database were given either an ac- 
cepted name firom the Literature or the GenBank accession number. 
Accurate chromosomal positions were obtained horn the University 
of CaHfomia Santa Cruz "the golden gate" human genome database 
(http://genome.ucsc.edu), the Dec 2001 assembly. If not present in 
the public genome assembly, we used the chromosomal position firom 
the Celera database (http://www.celera.com). 

Alignment. Each data set was randomized 20 times with regard 
to sequence input order using a program called Randfasta (http:// 
www.neuro.uu.se/medfarm/schiothSoft.html), because the input or- 
der of sequences is known to a£fect the resulting alignment. These 20 
data sets, containing the fuU set of sequences but in different order, 
were all aligned using the Win32 version of ClustalW 1.81 (Thomp- 
son et al., 1994). The default alignment parameters were applied. 

Sequence Bootstrapping and Randomization. The 20 align- 
ments were all bootstrapped 50 times using SEQBOOT from the 
Phylip package (Felsenstein, 1993) to obtain a total of 1000 different 
alignments from each dataset. 

Neighbor-Joining Trees. Protein distances were calculated us- 
ing Protdist from the Win32 version of the Phylip package. For the 
calculation, the Dayhof PAM matrix was used. The trees were cal- 
culated on the 20 different distance matrixes, previously generated 
with Protdist, using neighbor from the PhyUp package, resulting in 
20 files with 50 trees each. All trees were unrooted. Because of 
limitations in the Consense program (version 3.5; Felsenstein, 1993), 
a consensus tree for the complete rhodopsin family could not be 
calculated; therefore, 300 bootstrap replicas were used. The trees 
were plotted using Treeview (http://taxonomy.200logy.gla.ac.uk/rod/ 
treeview.html). 

Maximum Parsimony Trees. Maximum parsimony trees were 
calculated from the same input files that were used for Protdist using 
Protpars firom the PhyUp package. The trees were unrooted and 
calculated using ordinary parsimony, and the topologies was ob- 
tained using the built-in tree search procedure. As above, consensus 
trees were calculated using Consense 3.5 from Phylip and trees were 
plotted using Treeview. 

Calculating the Overall Relationship of the Main GPCR 
Families Using Random Selection of Genes. These calculations 
are based on all members fix>m four of the main groups: secretin, 
firizzled, glutamate, and adhesion, together with 20 randomly se- 
lected rhodopsin receptors, selected using Randfasta. Randfasta was 
used to randomize the input order of sequence 20 times. The 20 
datasets were aligned, sampled using SEQBOOT (50 rephcas each), 
and 1000 parsimony trees were calculated using Protpars and con- 
sensus trees were calculated using Consense 3,5. 

Fingerprint Analysis. For the fingerprint/motif analyses an ap- 
proach using Hidden Markov Models (HMM) was applied as imple- 
mented in the HMMR 2.1 package (Eddy, 1998), recompiled for 
WIN32 using Visual C++ 6.0. From the secretin, adhesion, gluta- 
mate, rhodopsin and firizzled families, alignments of the entire cod- 
ing regions were constructed using ClustalW 1.81; from these align- 
ments, one HMM per family was calculated using the HMMbuild. 
The model aUowed local alignments within the HMM, global align- 
ments with respect to the query sequence, and multiple domains per 
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sequence to hit. All HMMs were calibrated using HMMcalibrate. To 
define the transmembrane regions statistically described by the 
HMMs, the transmembrane region as described in the literature for 
one of the members of each family was aligned to the respective 
HMM using HMMsearch. The sequences used were FZD3, GRMl, 
GLPl, LECl, and ADRB2. The identified TM regions from the 
HMMs were subsequently aligned to each other, region by region, 



using ClustalW 1.81, and conserved motifs were identified in the 
HMM alignments by manual inspection. 

Results 

The schematic presentation of the approach used for re- 
trieving sequences and the overall phylogenetic analysis is 
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Fig. 1. Flowchart describing the sequence analysis strategy used in this work. The first step was to construct a database of GPCRs in the human 
genome. Using the Entrez online data retrieval tool and keyword searches, we downloaded approximately 200 GPCRs known from literature. Most 
GPCRs have several names, and they have also been deposited in database under several entries. Therefore, the database was carefully checked to 
remove any duplicated genes throughout the process. The approximately 200 human GPCRs were considered our "seeding^ set, and the sequences were 
manually searched against the human genome database and the NR database to extend the GPCR database. Primary phylogenetic analyses, using 
a small number of bootstrap replicas and no randomization of the input file, were performed on the sequences in the database to identify spUce 
variants, polymorphism, and duplicates. The final step was to search against the human genome GeneScan database, which contains genes predicted 
from the genome sequence by the GeneScan algorithm, to obtain possible nonannotated genes. The large phylogenetic analyses were carried out as 
described imder Materials and Methods. Briefly, the Fasta file containing all sequences in the database was randomized using the Randfasta program 
to randomize the input order of the sequences, because the input order of the sequences can influence the resulting alignment. The sequences in these 
files were subsequently aligned and each of the sequence files was bootstrapped using the SEQBOOT software to obtain 1000 replicas of the alignment. 
Neighbor joining trees were constructed using the Protdist, Neighbor, and Consense programs. From this initial tree, the rhodopsin-like GPCRs and 
the nonrhodopsins were identified. The rhodopsin family was analyzed as one imit tising the same strategy as above; firom that analysis, the olfactory 
receptors and the four rhodopsin groups were identified. This analysis was carried out several times using both maximum parsimony and neighbor 
joining methods, and the groups that were finally defined were consensus groups firom all these trees. A few receptors did not show stable topology 
in any group and these are discussed separately under Results. The nonrhodopsin receptors were analyzed both as full-length receptors and with the 
N- and C-termini removed, as shown in Fig. 2. To investigate how the rhodopsin family is related to the nonrhodopsins, 20 rhodopsins were randomly 
selected and included in the calculations. These analyses were repeatedly performed using the maximum parsimony method, with the dataset 
randomized as above, using Protpars and Consense. The four rhodopsin groups were also analyzed using maximum parsimony in the same way as 
described for the nonrhodopsin, but also using neighbor joining and maxim tma hkelihood methods as described under Materials and Methods. These 
trees are not presented in this work but were used to identify instabilities in the topologies and are available upon request. 
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shown in Fig. 1. Detailed descriptions of the different steps 
are given under Materials and Methods. We assembled a 
primary data set of 802 unique GPCRs from the human 
genome. We believe that this data set contains most of the 
functional GPCRs in the himian genome. The results show 
that the receptors cluster in five main families that we term 
glutamate (G, with 15 members), rhodopsin (R, 701), adhe- 
sion (A, 24), frizzled/taste2 (F, 24) (frequently abbreviated to 
frizzled hereafter), and secretin (S, 15), to which we apply the 
acronym GRAFS. Twenty-three protein sequences could not 
be assigned to any of the five families with appreciable boot- 
strap values (above 50%); these are discussed separately 
below imder the section "other 7TM receptors". Figure 2 
shows trees describing the overall relationship between the 
five main families of GPCRs. The bootstrap vgdues shown in 
Fig. 2 separating the respective family from its closest neigh- 
bor [secretin (862), adhesion (789), glutamate (839), and firiz- 
zled (774)] are high; together with the overall topology, they 
give good support for each of the GRAFS families. The phy- 



logenetic analysis shown in Fig. 2 is performed on protein 
sequences in which the N and C termini were deleted (see 
detailed comments below), whereas analysis on the full- 
length sequences also provided good support for five main 
famiUes (data not shown). It should be noted here that the 
five families represent the smallest number of clvisters that 
the phylogenetic analysis can delineate from the data set 
with appreciable bootstrap values and that the phylogenetic 
analysis does not show sufficient bootstrap support to link 
any of the GRAFS families together. It is possible, however, 
to further subdivide each family, because several bootstrap 
values within them show very high values. For exeimple, the 
GABA receptors could be divided from the other receptors in 
the glutamate family, but because there are appreciable boot- 
strap values that link them within the glutamate family, we 
have decided to stick to this minimum ntunber of families 
(i.e., five). The rhodopsin family has by far the largest nyxm- 
ber of receptors and was therefore further subdivided into 
four main groups and 13 branches (see below). 




Fig. 2. Phylogenetic relationship between the GPCRs (TMI-TMVII) in the human genome. The tree was calculated using the maximum parsimony 
method on 1000 replicas of the data set terminally tnmcated GPCR as described under Materials and Methods. The position of the rhodopsin family 
was established by including twenty random receptors from the rhodopsin family. These branches were removed from the final figure and replaced 
by an arrow toward the rhodopsin family analysis in Fig. 3. 



1 260 l^redriksspn et al. 



The receptors in all of the main families, except the rho- 
dopsin family, have long N termini, whereas the rhodopsin 
family has only a few members with this characteristic. 
These long N termini are especially evident for the receptors 
within the adhesion family, but the secretin, glutamate, and 
frizzled receptors have also rather long N termini that are 
fairly rich in Cys residues. The only significant common 
feature of the proteins is the seven TM stretch; fi*om an 
evolutionary perspective, it could be misguided to include 
these diverse and long N termini in the analysis. The number 
of evolutionary events needed for generating long N termini 
is likely to be more related to, for example, the nmnber of 
domains than the replacements of single amino acids used for 
the phylogenetic calculations. Therefore, we decided to use 
the tnmcated receptors, where we use the sequence fi-om the 
start of the TMI to the end of TMVII, for the main tree 
presented in Fig. 2. Each of the receptors was thus manually 
cut to provide this data set. 

Below we give comments to our results for each of the 
families. The number of receptors in each family is indicated 
in parentheses. At the end of each section, we list the recep- 
tor names. First, we give the sequence identification name in 
bold. We provide the HUGO name in parenthesis in those 
cases in which it is different from the name we found to be 
most appropriate, for various reasons, except for the chemo- 
kine receptors (found in the rhodopsin family). HUGO lists 
only a few chemokine receptors, £uid the current naming 
system is thus not appropriate until it is more complete. We 
did not add their names in pgu-enthesis, because we would 
have ended up with the same name for different receptors in 
our lists. After the name, we list the sequence accession code 
followed by the chromosomal position. We want the reader to 
be aware that many of the receptors have multiple additional 
names; a list with alternative names, which can be found 
online (http :/Avww .neuro.uu. se/medfarm/schiothArt.html), 
includes many of the names provided by ENSEMBL (http:// 
www.ensembl.org/). 

The Secretin Receptor Family (15) 

The receptors in the secretin family bind rather large pep- 
tides that share high amino acid identity and most often act 
in a paracrine manner. The secretin family name is related to 
the fact that .the secretin receptor was the first one to be 
cloned in this family. The term "secretin-Hke receptor" has 
also frequently been used in the literature for receptors in 
this cluster. This group basically corresponds to clan B of the 
A-F system. The N terminus, between —60 and 80 amino 
acids long, contains conserved Cys bridges and is particularly 
important for binding of the ligand to these receptors. The N 
terminus of the vasoactive intestinal peptide receptor (VIPR) 
and pituitary adenylyl cyclase-activating protein (PACAP) 
receptors alone constitutes a functional binding site for the 
hgand. Members of this family are the calcitonin receptor 
(CALCR), the corticotropin-releasing hormone receptors 
(CRHRs), the glucagon receptor (GCGR), the gastric inhibi- 
tory polypeptide receptor (GIPR), the glucagon-like peptide 
receptors (GLPRs), the growth hormone-releasing hormone 
receptor (GHRHR), PACAP, the parathyroid hormone recep- 
tors (PTHR), the secretin receptor (SCTR), and VIPR. The 
tree has four main subgroups: the CRHRs/CALCRLs, the 
PTHRs, GLPRs/GCGR/GIPR and the subgroup including se- 
cretin and four other receptors. Most of these receptors, 11 of 



15, belong to the HOX paralogon, 2q/12q/17q/7/(3p) (see Fig. 
4): 

CALCR, NP_001733.1, 7q21.3; CALCRL, NP_005786.1, 2q21.1- 
q21.3; CRHRl, NP_004373.1, 17q21.31; CRHR2, NP_001874.1, 
7pl4.3; GCGR, NP_000151.1, 17q25.3; GHRHR, NP_000814.1, 
7pl4; Gn>R, NP_000155.1, 19ql3.3; GLPIR, NP_002053.1, 6p21.2; 
GLP2R, NP_004237.1, 17pll.2; PACAP, NP_001109.1, 7pl4; 
PTHRl, NP_000307.1, 3p21.31; PTHR2, NP_005039.1, 2q33; 
SCTR, NP_002971.1, 2ql4,l; VIPRl, NP_004615.1, 3p22.1; Vn*R2, 
NP_003373.1, 7q36.3 

The Adhesion Receptor Family (24) 

This rather new and pecuHar family of GPCRs consists of 
receptors with GPCR-like transmembrane-spanning regions 
fused together with one or several functional domains with 
adhesion-like motifs in the N terminus, such as EGF-like 
repeats, mucin-like regions, and conserved cysteine-rich mo- 
tifs (for overview on the N termini in some of these receptors, 
see Hayflick, 2000; Harmar, 2001). The N termini are vari- 
able in length, from about 200 to 2800 amino acids long, and 
are often rich in glycosylation sites and prohne residues, 
forming what has been described as mucin-like stalks. The 
family name "adhesion" relates to these long N termini, 
which contains motifs that are likely to participate in cell 
adhesion (McKnight and Gordon, 1998; Stacey et al., 2000). 
Some receptors in this family have been termed secretin-like 
receptors, and the latrotoxin receptors have previously been 
placed into clan B (Flower, 1999) or clan B2 (Harmar, 2001), 
but our analysis clearly shows that they belong to a distinct 
family of their own. The bootstrap values for the adhesion 
and the secretin families are also very high at 789 and 862, 
respectively, indicating clear distinction between the fami- 
hes. The analysis of the full-length proteins also indicates 
distinction between the secretin and adhesion families (data 
not shown). Although the phylogenetic analyses by Harmar 
(2001) does not stretch beyond "clan B" (secretin and adhe- 
sion), it basically supports our conclusion of separate clusters 
of secretin and adhesion receptors. Our analysis shows that 
several of the receptors appear in clusters of three or four; the 
CELSRs (EGF LAG seven-pass Gr-type receptors), the brain- 
specific angiogenesis-inhibitory receptors (BAIs), the lecto- 
medin receptors (LECs) and the EGF-like module containing 
(EMRs). CD97 antigen receptor (CD97) and EGF-TMVII- 
latrophilin-related (ETL) also group with these on a separate 
main branch. CD97 share highest sequence similarity with 
EMR2 (56%), which is higher than the level of identity within 
the EMRs. The EMRs and CD97 are all positioned on 19p31, 
indicating that they may have arisen through several local 
gene duplications. The other main bremch includes HE6 (TM- 
VIILN2) and GPR56 (TMVIIXNl or TMVIILN4) and a group 
of recently discovered receptors, related to GPR56 and HE6, 
named GPR97 and GPRllO to GPR116 (Fredriksson et al., 
2002). The N termini of the receptors in this branch have 
varying lengths and relatively few identified functional do- 
mains compared with the other main branch of the adhesion 
receptors. Most of the genes of the entire adhesion family are 
positioned within the paralogon l/5p-q21/6p21-p25/9/15qll- 
q26/19p providing support for their common ancestry (Fig. 4): 
BAIl, NP_001693,1, 8q24; BAI2, NP_001694.1, lp35; BAI3, 
NP_001695.1, 6ql2; CELSRl, NP_055061.1, 22ql3.3; CELSR2, 
NP_001399.1, lp21; CELSRS, NP_001398.1, 3p21.31; CD97, 
NP_001775,1, 19pl3.13; EMRl, NP_001965.1, 19pl3.3; £]V[R2, 
NP_038475.1, 19pl3.1; EMR3, NP_11696ai, 19pl3.3; ETL, 
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NP_071442.1, Ip33-p32; GPR97, AY140959, 16ql3; GPRllO, 
AY140952, 6pl2.3; GPRlll, AY140953, 6pl2.3; GPR112, 
AY140954, Xq26.3; GPR113, AY140955, 2p23.3; GPR114, 
AY140956, 16ql3; GPR115, AY140957, 6pl2,3; GPR116, 
AY140958, 6pl2.3; HE6 (GPR64), NP_005747.1, XP22.22; LECl, 
NP_036434.1, lp31.1; LEC2, NP_055736.1, 19pl3.2; LEC3, 
NP_056051,1, 4ql3.1; GPR56 (TMVIIXNl), NP_003263,1, lq42- 
q43 

The Glutamate Receptor Family (15) 

This family of receptors consists of eight metabotropic glu- 
tamate receptors (GRM), two GABA receptors (e.g., GAB- 
AbRl, which has two splice variants, a and b, and GAB- 
AbR2), a single calciimi-sensing receptor (CASR), and five 
receptors that are believed to be taste receptors (TASl). This 
group basically corresponds to what has been called clan C 
receptors. Several other GABA receptors are found in the 
human genome, but these are ion channels. The ligand rec- 
ognition domain in the metabotropic glutamate is fovmd in 
the N terminus of ^280 to 580 amino acids, and it has been 
proposed to share structural homology with bacterial amino 
acid binding proteins, such as LIVBP. The N terminus is 
believed to form two distinct lobes separated by a cavity in 
which glutamate binds, forming a so-called "Venus fly trap" 
where the glutamate causes the lobes to close around the 
ligand. The CASR also has a long cysteine-rich N terminus, 
but it is uncertain if it is involved in the binding of Ca^"*", even 
though it is important for mediating the signal of Ca^"^. The 
N-terminal of the GABA receptors is long and contains the 
ligand-binding site but lacks the cysteine-rich domain found 
in the other receptors of this family. The TASl receptors also 
have a long N terminus with a series of conserved Cys resi- 
dues. They are expressed in the tongue and are likely to 
mediate taste signals. CASR falls with the TASl receptors, 
whereas the two GABA receptors branch bsisally in the fam- 
ily. GRM2 and GRM3 share 67% sequence identity and are 
located in chromosomal regions 3p and 7q, respectively. 
GRM7 and GRM8 share 74% sequence identity and are also 
positioned on 3p and 7q. These regions are both part of the 
postulated lp/3p/7/22q paralogon, supporting a common an- 
cestry (Fig. 4): 

CASR, NP_000379.1, 3q21.1; GABBRl, NP_001461.1, 6p21.1; 
GABBR2(GPR51), NP_005449.1, 9q22.1-q22.3; GRMl, 
NP_000829.1, 6q24.3; GRM2, NP_000830.1, 3p21.31; GRM3, 
NP_000831.1, 7q21.12; GRM4, NP_000832.1, 6p21.1; GRM5, 
NP_000833.1, llq21.1; GRM6, NP_000834.1, 5q35.3; GRM7, 
NP_000835.1, 3p21.1; GRM8, NP_000836.1, 7q31.3-q32.1; 
GPRC6A, NP_683766.1, 6q22.1; TASIRI, NP_619642, lp36.23; 
TAS1R2, NP_689418.1, lp36.2; TAS1R3, XP_060177.1, lp36.33 

The Frizzled/Taste2 Receptor Family (24) 

This group includes two distinct clusters, the frizzled re- 
ceptors and the TAS2 receptors. We were surprised that the 
TAS2 receptors clustered together with the fiiizzled receptors 
with a high bootstrap value. There are no obvious similarities 
between the receptors in the fiizzed branch and the taste 
branch of this receptor family. However, when we compared 
the TAS2 receptors consensus sequence against an HMM 
model of the Mzzled receptor branch, several featiires may 
explain why these two groups of receptors cluster together, 
such as consensus sequence of IFL in TMII, SFLL in TMV, 
and SxKTL in TMVII. None of these motifs is found in the 
consensus sequences of the other four families. The TAS2 



receptors showed no clear similarities with the TASl recep- 
tors in the glutamate receptor family. The TAS2 receptors 
show clearly seven hydrophobic regions in a hydrophobicity 
plot but they have a very short N terminus that is unlikely to 
contain a ligand binding domain. Rather little is known 
about the role and function of the TAS2 receptors except that 
they are expressed in the tongue and palate epithelivma, and 
it is believed that they function as bitter taste receptors. We 
found 13 TAS2 receptors in the himaan databases. Two of the 
receptors we found were not previously annotated or found in 
any database. We approached the HUGrO Gene Nomencla- 
ture Committee at University College London and they con- 
firmed that the sequences were unique and not pubhc. The 
committee provided these receptors with new GPR nvimbers 
(GPR59 and GPR60). These numbers had previously been 
preliminarily assigned to other receptors but were never 
used, which explains the low GPR numbers. 

The frizzled receptors control cell fate, proliferation, and 
polarity during metazoan development by mediating signals 
fi:om secreted glycoproteins termed Wnt. The fidzzled name 
was first used for a receptor cloned fi*om D melanogaster, and 
the fiizzled name (referring to the c\irled and twisted Wnt 
Hgand) has fi"equently been used for this relatively recently 
discovered cluster of receptors. It has been shown that 'Wnt 
Hgand binding to the rat F2DR can induce G-protein coupling 
(Slusarski et al., 1997), providing evidence that the fiizzled 
proteins are GPCRs. This has eJso been supported by previ- 
ous phylogenetic analyses showing some structural relation- 
ship to GPCRs (Barnes et al., 1998). The fiizzled family of 
receptors have a 200-amino acid N terminus with conserved 
cysteines that are likely to participate in Wnt binding. The 
fi:izzled family consists of 10 finzzled receptors, FZDl-10, 
together with SMOH, which is the most divergent receptor of 
the family, sheiring only 24% identity with FZD2 and less 
with the others. The topology of the tree shows four main 
clusters of the fiizzled branch of receptors; the cluster con- 
taining FZDl, -2, and -7 share approximately 75% identity 
with each other, FZD8 and -5 share 70% identity, FZD 10, 9, 
and 4 share -65% identity, and finally, FZD6 and -3 share 
50% amino acid identity. The identities shared by receptors 
from different clusters are between 20 and 40%, indicating 
that four parental genes from the fidzzled family were formed 
initisdly and the four clusters of receptors were subsequently 
formed out of these. All the fi:izzled genes, except FZD6, -3, 
and -8, are located in the chromosomal regions belonging to 
the HOX paralogy group. In addition, the phylogeny does 
indicate that the finzzled family was expanded in the two 
genome duplications proposed to have occurred basally iii the 
vertebrate lineage (see Introduction). This is supported by 
the fact that the FZD7, -1, and -2 genes are located on 
different paradogous chromosomes, as are FZD9 and -10. 
However, if this scenario is true, several genes were lost (for 
example, all other copies of the SMOH gene). Interestingly, 
all the taste2 receptoi*s firom this group are located in the 
lp3/3q/7q/12p/17p paralogon, indicating that some of these 
genes were present early in vertebrate evolution. The fact 
that the genes are clustered on chromosome 7q31 and 12pl3 
suggests that this family expanded through several local 
gene duplications. Noteworthy is that two of the fi:izzled 
receptors, FZD9 and SMOH, are also located in the same 
paralogon: 
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Fig. 3. The phylogenetic relationship between GPCRs 
(TMI-TMVII) in the human rhodopsin family. The tree 
was calculated using the maximum parsimony method 
on 300 replicas. The position of the olfactory cluster was 
established by including 17 diverse random receptors 
from the olfactory cluster. These branches were re- 
moved from the final figure and replaced by an arrow 
toward the olfactory receptor cluster. 



FZDl, NP_003496.1, 7q21.13; FZD2, NP_001454.1, 17q21.31; 
FZD3, NP_059108.1, 8p21.1; FZD4, NP_036325.1, llql4.2; FZD5, 
NP_003459.1, 2q33-q34; FZD6, NP_003497.1, 8q22.3-q23.1; FZD7, 
NP_003498.1, 2q33; FZD8. NP_114072.1, 10pll.21; FZD9, 
NP_003459.1, 7qll.23; FZDIO, NP_009128.1, 12q24.33; SMOH, 
NP_005622.1, 7q32.1; TAS2R13, NP_076409, 12pl3; TAS2R14, 
NP_076411.1, 12pl3; TAS2R7, NP_076408.1, 12pl3; TAS2R9, 
NP_076406.1, 12pl3; TAS2R8, NP_76407.1, 12pl3.2; TAS2R3, 
NP_058639.1, 7q31.3-q32; TAS2R10, NP_076410.1, 12pl3; 
TAS2R5, NP_061853.1, 7q31.3-q32; TAS2R4, NP_058640.1, 7q31.3- 
q32; TAS2R1, NP_062545.1, 5pl5; TAS2R16, NP_58641.1, 7q31.1- 
q31.3; GPR59, XP_069626, 7q33; GPR60, XP_090424, 7q33 

The Rhodopsin Family (241 Nonotfactory, Total of 701) 

The rhodopsin family has the largest number of receptors 
and overall analysis is shown in Fig. 3 (except the olfactory 
cluster; see conmients below). The rhodopsin family corre- 
sponds to what has previously been called either the rhodop- 
sin-like receptors or clan A in the A-F classification system. 



The rhodopsin family has several characteristics such as 
NSxxNPxxY motif in TMVII, the DRY motif or D(E)-R-Y(F) 
at the border between TMIII and IL2. Only a few receptors do 
not comply with these motifs, but these have other **finger- 
print" elements that clearly link them to the rhodopsin fam- 
ily, apart firom the phylogenetic analysis. The crystal struc- 
ture of bovine rhodopsin has been revealed (Palczewski et al., 
2000). Bovine rhodopsin has highest homology to rhodopsin 
(RHO) in the opsin receptor group. It should be noted that 
bacteriorhodopsin has no sequence similarity with the GPCR 
receptors in the human genome (Josefsson, 1999). The U- 
gands for most of the rhodopsin receptors bind within a 
cavity between the TM regions (Baldwin, 1994). There are, 
however, important exceptions to this, in particular for the 
glycoprotein binding receptors (LH, FSH, TSH, and LG), 
where the ligand-binding domain is in the N terminus. Our 
analysis showed four main groups. We have opted to call 
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these main groups a, ]3, 7, and 6. Results for each of the 
groups are described below. 

The a-Group of Khodopsin Receptors (89). This group 
has five main branches: the prostaglandin receptor cluster, 
amine receptor cluster, opsin receptors cluster, melatonin 
receptor cluster, and MECA receptor cluster. The bootstrap 
values that define these branches are very high (267, 262, 
290, 299, and 239 of 300, respectively); these are highlighted 
in bold in Fig. 3. 

The prostaglandin receptor cluster (15). This branch has 
eight prostaglandin receptors and seven orphan receptors. 
The prostaglandin receptors (PTGERs) are between 19 and 
41% identical and share motifs in TMVII (IXDPW), and in 
the TMI (LXXTDXXG). The PTGERs, except PTGDR and 
PTGER4, belong to the paralogous regions on chromosomes 
l/5p-q21/6p21-p25/9/15qll-q26/19p, further supporting the 
likelihood that the receptors in this group share a common 
evolutionary origin (Fig. 4). PTGDR and PTGER4 belong to 



the Iq23-q44/2p22-p25/llql3.1-q23.4/14q/15qll-q26/19q/ 
20p paralogon: 

TBXA2R, NP_001051.1, 19pl3.3; PTGERS, NP_000948.1, lp31; 
PTGER2, NP_000947.1, lq22.1; PTGDR, XP_051711.1, 14q22.1; 
PTGER4, NP_000949.1, 5pl2; PTGIR, NP_000951.1, 19ql3,31; 
PTGERl, NP_000946.1, 19pl3.12; PTGFR, NP_000950.1, IpSl.l; 
SREB3, NP_061842.1, Xpll; GPR26, XP_061555.1. 10q26.2; 
SREB1(GPR27), NP_061844.1, 3p21-pl4; SR£B2(GPR85), 
NP_061843.1, 7q31; GPR61, NP_114142, lpl3.3; GPR62, 
NT_005975.6, 3p21.31; GPR78, NT_006307.5, 4pl6.1 

The amine receptor cluster (40). The biogenic amine recep- 
tor group contains serotonin receptors (HTR), dopamine re- 
ceptors (DRD), muscarinic receptors (CHRM), histamine re- 
ceptors (HRH), adrenergic receptors (ADR), trace amine 
receptors (TAR), and several orphan receptors. All the known 
ligands of the receptors in this group are structurally related 
small amine molecules with a single aromatic ring. The de- 
gree of sequence conservation varies among the different 
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classes. The HTRs display a heterogeneous phylogenetic pat- 
tern. Two distinct subgroups can be seen, the HTR2s and 
HTRIB-IF. The rest of the HTRs branch separately or to- 
gether with other biogenic amine receptors. These receptors 
6ire positioned near each other on chromosome 5q, suggesting 
early local gene duplication. The ADRs form three clusters in 
the phylogenetic tree, resulting in branches containing 
ADRAl, ADRA2, and ADRB, respectively. The three clusters 
could be a result of the postulated vertebrate genome dupli- 
cations because the receptor genes, with a few exceptions, are 
positioned within the MetaHOX paralogon (Lundin, 1993; 
Coulier et al., 2000). This could explain why the sequence 
identities within the clusters are more than 45%, whereas 
the identities between the groups are about 25%. The TAR 
subgroup shares 37 to 82% sequence identity and the recep- 
tors are all positioned on chromosome 6q23, suggesting sev- 
eral early and late local gene duplications. This is evident 
also in rat, having 14 different TARs with high sequence 
identity, indicating an ongoing expansion of this gene family 
in mammals. Two orphan GPCRs, GPR57 and GPR58, share 
sequence similarities with the TARs. Several motifs, includ- 
ing RliAAKTLG in TMVI and FKQLHXPTN in TMI, to- 
gether with the chromosomal data, strengthens their rela- 
tionship to the TARs. CHRMs form the most homogenous 
cluster within the amine group, sharing between 40 and 50% 
identity. This can be seen in the tree with the receptors 
grouping together with strong bootstrap support. The DRDs 
appear in two clusters in the tree: with DRD2, DRD3, and 
DRD4 on one branch, placing DRD4 most basal, and DRDl 
and DRD5 together with the /3-adrenergic receptors. Identi- 
ties within the dopamine clusters are 38 to 52% and 54%, 
respectively. The sequence identities between the clusters 
are -27%, whereas ADRABl and DRDl are 31% identical. 
The serotonin receptors are the largest group, with 13 mem- 
bers distributed more or less over the entire amine group 
tree, in general sharing low sequence identity, often as low as 
20%: 

HTRIA, NP_000515.1, 5qll.2-ql3; HTR5(HTR5A), NP_076917.1, 
7q36.3; HTR7, NP_000863.1, 10q21-q24; HRH2, NP_071640.1, 
5q35.2; HTR4, NP_000861.1, 5q31-q33; HTR6, NP_000862.1, lp36- 
q35; ADRAIA, NP_000671.1, 8p21.2; ADRAID, NP_000669.1, 
20pl3; ADRAIB, NP_000670.1, 5q33.1; ADRBl, NP_000675,1, 
10q25.3; ADRB3, NP_000016.1, 8pl2-pll.2; ADRB2, NP_000015.1, 
5q32; DRD5, NP_000789.1, 4pl6.1; DRDl, NP_000785.1, 5q35.2; 
HTR2B, NP_000858.1, 2q36.3-q37.1; HTR2A, NP_000612.1, 13ql4- 
q21; HTR2C, NP_000859.1, Xq24; TARl, AAK71236; 8q23.2; PNR, 
NP_003958.1, 6q23; TAR3, AAK71240; 6q23.2; TAR4, AAK71243; 
6q23.2; TAR5(GPR102), NP_444508.1, 6q23.2; GPR58, 
NP_055441.1, 6q24; GPR57, NP_055442.1, 6q23.2; HTRIB, 
NP_000854.1, 6ql3; HTRID, NP_008555.1, Ip36.3-p34.3; HTRIE, 
NP_000856.1, 6ql4-ql5; HTRIF, NP_000857.1, 3pl2; ADRA2B, 
NP_000673.1, 3pl3-ql3; ADRA2A, NP_000672.1, 10q25.2; 
ADRA2C, NP_000674.1, 4pl6; DRD4, NP_000788.1, llpl5.5; 
DRDS, NP_000787.1, 3ql3.3; DRD2, NP_000786.1, llq23; HRH4, 
NP_067830.1, 18qll.2; CHRM4, NP_000732,1, llpl2-pll.2; 
CHRM2, NP_000730.1, 7q31-q35; CHRMl, NP_000729.1, llql3; 
CHRMS, NP_000731.1 lq43; CHRM5, NP_036257.1, 15q26 

The opsins receptor cluster (9). This cluster of receptors 
comprises the rod visual pigment (RHO), the three cone 
visual pigments (OPNISW, OPNILW, OPNIMW), the per- 
opsin (RRH), the encephalopsin (0PN3), the melanopsin 
(0PN4), and the retinal G-protein-coupled receptor (RGR). 
The opsins are the only GPCRs that are known to respond to 



Hght, and none of the receptors £ire known to bind any phys- 
ical ligand. OPNILW and OPNIMW are found in the same 
chromosomal position, Xq28. These two proteins are more 
than 96% identical, indicating, together with the fact that 
they are positioned near one another on Xq, that they share 
a recent common ancestor. Phylogenetic comparison of opsins 
in different species also indicates that the duplication is 
specific for mammals. The phylogenetic analysis divides the 
group into three branches; RHO/OPNISW/OPNILW/ 
OPNIMW, RRH/RGR, and OPN3/OPN4. The chromosomal 
localization of these receptors is not consistent with any 
paralogy group, but it is worth noting that RGR and 0PN4 
are found in the same chromosomal position, 10q23: 
GPR21, NP_005285.1, 9q33; GPR52, NP_005675.1, lq24; RHO, 
NP_000530.1, 3q21.q24; OPNILW, NP_064445.1, Xq28; CBP; 
OPNIMW, NP_000504.1, Xq28; OPNISW, NP_001699.1, 7q31.3- 
q32; RRH, NP_006574.1, 4q; OPN3, NP_055137.1, lq43; OPN4, 
NP_ 150598.1, 10q22 

The melatonin receptor cluster (3). The analysis discerns 
two subgroups in this tree: the melatonin receptors 
(MTNRIA, MTNRIB) together with the orphan receptor 
GPR50. GPR50 has an extended C-terminal end compared 
with the MTNRs, whereas the other regions of the receptors 
most closely resemble MTNRs, especially in the third TM 
helix, which is ahnost identical. GPR50 and MTNRIA both 
belong to the ParaHOX paralogon (Fig. 4): 
GPR50, NP_004215.1, Xq28; MTNRIA, NP_005949.1, 4q36.1; 
MTNRIB, NP_005950.1, Ilq21-q22 

The MECA receptor cluster (22), This group consists of the 
melanocortin receptors (MCRs), endothelial differentiation 
G-protein coupled receptors (EDGRs), cannabinoid receptors 
(CNRs), and adenosin binding receptors (ADORAs). Three 
orphan receptors also belong to this group (GPR-3, -6, and 
-12). It is interesting to note that the receptors in this group 
bind structvirally different Hgands; melanocyte stimulating 
hormone (13-residue peptide, MCRs); lysophosphatidic acid 
(hpid, EDGRs), and anandamide (arachidonylethanolamide, 
CNRs) and adenosine. The orphan receptors are 55% identi- 
cal to each other and roughly 25% identical to the MCRs. The 
orphans share several motifs with the MCRs, such as PM(Y/ 
F)X(F/L)X(C/G)SLAXADXL in TMIII, ALXY(H/Y) in TMIV, 
and PXIYAFR in TMVII. The CNRs share 39% identity to 
each other and their chromosomal positions indicate a com- 
mon ancestor, because both genes are located in the paralo- 
gous group involving the positions lp3 and 6q (Spring, 1997) 
(Fig. 4). GPR3 and GPR6 share the same chromosomal posi- 
tions as the CNRs, which may indicate that these orphans 
share a conunon ancestor with the CNRs. The MCRs shares 
between 39 and 56% identity and belong to the 8q/16q/18/20q 
paralogon, supporting the idea that they share a conmion 
ancestor (Fig. 4). The EDG receptors form clvisters at chro- 
mosome Ip, 9q, and 19p, suggesting two common ancestors 
together with one extra gene duplication at position 19p, 
residting in two EDGRs at Ip and 9q, together with four 
EDGRs at chromosome 19p. These genes are adl positioned in 
the paralogy group that was first proposed by Katsanis et al. 
(1996) and subsequently expanded by Popovid et al. (2001) 
]y5p-q21/6p21-p25/9/15qll-q26/19p (Fig. 4). All the adeno- 
sine receptors except ADORAl are located in the paralogy 
group 7/16p/17/22q (Fig. 4): 

ADORAS, NP_000668.1, lpl3.3; ADORAl, NP_000671.1, 8p21.2; 
AD0RA2A, NP_000666.1, 22qll.23; AD0RA2B, NP_000667.1, 
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17ql2; GPR3, NP_005272.1, lp35.3; GPR12, NP_005279,1, 
13ql2.13; GPR6, NP_005275.1, 6q21; MC2R, NP_000520.1, 
18pll.2; MCIR, NP_002377.1, 16q24.3; MC3R, NP_063941.1, 
20ql3.31; MC4R, NP_005903.1, 18q22; MC5R, NP_005904.1, 
18pll.2; EDG7, NP_036284.1, lp22.3; EDG2, NP_00 1392.1, 9q31.3; 
EDG4, NP_004711.1, 19pl2; EDG8, NP_110387.1, 19pl3.2; EDG5, 
NP_004221.1, 19pl3.2; EDG6, NP_003766.1, 19pl3.3; EDG3, 
NP_005217.1, 9q22.1; EDGl, NP_001391.1, lp21; CNRl, 
NP_001831.1, 6ql5; CNR2, NP_001832.1, lp36.11 

The /3-Group of Rhodopsin Receptors (35). This group 
has no main branches and includes 36 receptors (Fig. 3). All 
the known ligands to these receptors are peptides. The group 
includes the hypocretin receptors (HCRTRs), the neuropep- 
tide FF receptors (NPFFs), the tachykinin receptors 
(TACRs), the cholecystokinin receptors (CCKs), the neu- 
ropeptide Y receptors (NPYRs), the endothelin-related recep- 
tors (EDNR and ETBRLPl/2), gastrin-releasing peptide re- 
ceptor (GRPR), the neuromedin B receptor (NMBR), the 
uterinbombesin receptor (BRS3), the neurotensin receptors 
(NTSRs), the growth hormone secretagogues receptor 
(GHSR), the neiiromedin receptors (NMURs), the thyro- 
tropin releasing hormone receptor (TRHR), the ghrelin re- 
ceptor, arginine vasopressin receptors (AVPRs), the gonado- 
tropin-releasing hormone receptors (GNRHRs), and the 
oxytocin receptor (OXTR) and orphan receptor. 

The NPY5R groups with the CCK receptors rather than 
with the other NPY receptors. This might seem confusing, 
but it is consistent regardless of the method used (maximum 
parsimony, neighbor joining). One reason for this topology is 
that the NPY5R has a large third extracellular loop that is 
not present in the other NPYRs but is found in the CCK 
receptors. This feature might be the reason for this seemingly 
large difference between the NPY5R and the other NPY 
receptors. If the third extracellular loop of the NPY5R is 
removed, the NPY5R places on the same branch as NPY2R 
(data not shown). Surprisingly, the NPY2R has a higher 
identity to PrRP and GPR72 than to the other NPY receptors. 
The receptor GPR118 is 27% identical to GPR72 whereas the 
identity to the other receptors on that branch is below 20%. 
Several of these receptor clusters (i.e., NPY, NPFF, CCK, 
TACR) are positioned within the MetaHOX paralogon, con- 
sisting of chromosomes 4, 5q, 10q21-26, 8pl2-22, and 2pll-23 
(see Fig. 4). EDNRA and EDNRB are both positioned in the 
paraHOX paralogon; 4q/5q/13q/X (Fig. 4). This paralogon 
also includes BRS3: 

AVPR2, NP_000045.1, Xq28; AVPRIA, NP_000697.1, 12ql4.1; 
AVPRIB, NP_000698.1, lq32; EDNRB, NP_000106.1, 13q22.3; 
EDNRA, NP_001948.1, 4q31.21; ETBRLPl (GPR37), 
NP_005293.1, 7q31; ETBRLP2, NP_004758, lq31.3; BRS3, 
NP_001718.1, Xq21-q28; CCKAR, NP_000721.1, 4pl6.1-pl5.2; 
CCKBR, NP_000722.1, llpl5.4; GhreUn(GPR38), NP_001498.1, 
13ql4-q21; GHSR, NP_004113.2, 3q26.2; GNRHR, NP_000397.1, 
4q21.2; GNRHRU, NP_476504.1, lql2; GRPR, NP_005302.1, 
Xp22.1-p22.13; HCRTR2, NP_001517.1, 6pl2.1; HCRTRl, 
NP_001516.1, lp33; NTSRl, NP_002522.1, 20ql3; NTSR2, 
NP_036476.1; NMU2R, NP_064552.1, 5q33.2; NMU1R(GPR66), 
NP_006047.1, 2q37.1; NMBR, NP_002502.1, 6q24.1; OXTR, 
NP_000907.1, 3p25; NPFFl, NP_071429.1, Iq21-q22; 
NPFF2(GPR74), NP_004876.1, 4q21; TACR2, NP_001048.1, 
10q22.1; TACR3, NP_001050.1, 4q25; TACRl, NP_001049.1, 
2pl3.1; TAC3RL, NP_006670.1; NPY5R, NP_006165,1, 4q31-q32; 
PPYRl, NP_005963.1, 10qll.21; NPYIR, NP_000900.1, 4q31,3; 
PrRP (GPRIO), NP_004239.1, 10q25.3-q26; GPR72, NP_057624.1, 
llq21; NPY2R, NP_000901.1, 4q31 



The y-Group of Rhodopsin Receptors (59). This group 
has three main branches: the SOG receptor cluster, MCH 
receptor cluster, and the chemochine receptors cluster. The 
bootstrap values that define these branches are high (276, 
299, and 219, respectively) (Fig. 3). 

The SOG receptor cluster (15). This cluster of receptors 
contains the GALRs that bind to the neiiropeptide galanin 
and the RF-amide binding receptor GPR54, the somatostatin 
receptors (SSTRs), and the opioid receptors (OPRs). GPR7 
and GPRS have recently been shown to bind neuropeptide W. 
The known ligands to the receptor in this branch are thus all 
peptides but they themselves share no structural similari- 
ties. 

Regarding the somatostatin receptors, we knew that 
SSTRl and SSTR4 are more closely related to each other 
than to other SSTRs, whereas the relationship between the 
other SSTRs was uncertain. The relationship between 
SSTRl and SSTR4 is strengthened by the fact that they 
share the same paralogous group, involving the chromosomal 
positions 20p and 14q (Fig. 4). The other three SSTRs belong 
to the paralogous regions consisting of chromosomes 7, 16p, 
17, and 22q. GPR7 has the highest identity to GPR8 (60.4%). 
Their sequence identity to both SSTRs and OPRs is around 
33%. It is intriguing to see that these orphans place at the 
same positions £is the OPRKl and OPRLl at chromosomal 
position 8qll.23 and 20ql3.33, respectively. This indicates 
that these orphans may indeed share an evolutionary origin 
with the OPRs. The OPRs share 49 to 59% identity, and are 
all part of the paralogous group consisting of lp3, 2p, 8q, 6, 
16q, 18, and 20q. The MCHIR and MCH2R have 32% iden- 
tity to each other and 26% to the SSTRs. The structural 
motifs in TMI and TMVII are conserved in MCHIR, whereas 
only the motif in TMII is conserved in MCH2R, althpugh 
several other common features of the group are represented 
within this receptor as well. The two GALR are positioned 
within the same paralogoxis group; 7/16p/17/22q. Motifs isuch 
as CCVPFXA in TMII and YLLP in TMV, together with a 
relatively high sequence identity to the GALR, strongly con- 
nect GPR54 to this cluster of GPCRs: 

GPR54, NP_115940.1, 19pl3.3; GALRl, NP_001471.1, 18q23; 
GALR2, NP_003848.1, 17q25.3; GALR3, NP_003605.1, 22ql3.1; 
GPRS, NP_000836,1, 7q31.3-q32.1; GPR7, NP_000835.1, 3p26.1; 
OPRLl, NP_000904.1, 20pl3.3; OPRDl, NP_000902.1, lp36.1- 
p34.3; OPRMl, NP_000905.1, 6q25.2; OPRKl, NP_000903.1, 
8qll.23; SSTRS, NP_001042.1, 22ql3.1; SSTRS, NP_001044.1, 
16pl3.3; SSTR2, NP_001041.1, 17q25.1; SSTRl, NP_061842.1, 
Xpll; SSTR4, NP_001043.1, 20pll.2 

The MCH receptor cluster (2). Two receptors branch off the 
SOG cluster with very high bootstrap value. The ligand is the 
melanin-concentrating hormone (MCH), which is a cyclic 
neuropeptide of 19 amino acids that is involved in regulation 
of feeding behavior: 

MCHR2, NP_115892.1, 6ql6.2; MCHRl (GPR24), NP_005288.1, 
22ql3.2 

The chemokine receptor cluster (42). This branch consists of 
the classic chemokines (OCRs, CXCRs), the angiotensin 
(AGTRs)^radykinin (BDKRBs)-related receptors, and a 
large nmnber of orphan GPCRs. Most of the ligands are 
peptides (chemokine, cystenyl-leukotriene, angiotensin, bra- 
dykinin). The topology of the tree and the fact that large 
niimbers of these receptors appear in clusters on several 
chromosomes both point toward a common ancestral origin. 
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This could be a result of several local gene duplications or, in 
the case of receptors appearing in paralogous regions, ge- 
nome duplications. A combination of these events might be 
the reason for the relatively diffuse phylogenetic topology of 
this group. 

The AGTRl and AGTR2 receptors position within the 3q/ 
13q/llql4-q25/17p/19q/Xq paralogon (Fig. 4). The two BD- 
KRBs are both positioned at 14q32.1, indicating possible 
local gene duplications. The genes for the receptors CCRl-5, 
CCR8, CCR9(GPR28), CCRll, CCRL2, CX3CR1, CCBP2, 
and XCRl are aU positioned on chromosome 3p2, indicating 
several local gene duplications. All the chemoldne receptors, 
except CCR6, CXCR5, and CXCR3, belong to the HOX paral- 
ogon 2q/12q/17q/7/(3p) (Holland et al., 1994): 
RDCl, NP_051522.1, 2q37.3; AGTRLl, NP_005152.1, llql2.1; 
GPRl, NP_005270.1, 2q33.3; CRTH2(GPR44), NP_004769.1, 
llql2.2; AGTR2, NP_000677.1, Xq23; ADMR, NP_009195.1, 
12q32.3; AGTRl, NP_000646.1, 3q24; CCR7, NP_001829.1, 
17q21.2; CCR6, NP_004358.1, 6q27; CXCR6, NP_006555.1, 3p21; 
CCR9, NP_006632.2, 3p21.31; CCRll, NP_057641,1, 3p21.31; 
CXCR4, NP_003458.1, 2q21.3; CCR8, NP_005192.1, 3p22.2; 
CCRL2, NP_003966.1, 3p21.31; CXC3R1, NP_001328.1, 3p22.2; 
CCR4, NP_005499.1, 3p24; CCRl, NP_001286.1, 3p21.31; CCR3, 
NP_001828.1, 3p21.31; CCR2, NP_000639.1, 3p21.31; CCR5, 
NP_600570.1, 3p21.31; XCRl(CCXCRl), NP_005274.1, 3p21.3; 
CCBP2, NP_001287.1, 3p21.31; CXCR5, NP_001707.1, llq23.3; 
CCR10(GPR2), NP_057687.1, 17q21.31; CXCR3(GPR9), 
NP_001495.1, Xql3; CXCR1(IL8RA), NP_000625.1, 2q35; 
CXCR2(IL8RB), NP_001548.1, 2q35; BDKRBl, NP_000701.1. 
14q32.2; BDKRB2, NP_000614.1, 14q32.2; CMKLRl, 
NP_004063.1, 12q23.3; C5L2(GPR77), NP_060955.1, 19ql3.3; 
C5R1, NP_001727.1, 19ql3.32; GPR32, NP_001497.1, 19ql3.3; 
FPRl, NP_002020.1, 19ql4.4; FPRL2, NP_002021.1, 19ql3.3; 
FPRLl, NP_001453.1. 19ql3.3; GPR25, NP_005289.1, lq32.1; 
GPR15, NP_005281.1, 3ql2.1; BLTR2, NP_062813.1, 14qll.2; 
BLTRaTB4R), NP_000743.1, 14qll.2; SALPR, NP_057652.1, 
5pl5.1-pl4 

The 6-Group of Rhodopsin Receptors (58, Plus an 
Estimated 460 Olfactory). This group has four main 
branches; MAS-related receptor cluster, glycoprotein recep- 
tor cluster, purin receptor cluster, and the olfactory receptor 
cluster (not shown in Fig. 3). 

The MAS-related receptor cluster (8). This group contains 
the MASl oncogene receptor (MAS) and the MAS-related 
receptors (MRGs and MRGXs). The MRGX family has high 
(over 65%) sequence identity. MRGD and MRGF share 30% 
identity with the MRGXs, whereas MAS has 25% to MRGXs. 
All the MRGX genes together with MRGF and MRGD are 
located on chromosome 11 and are likely to have arisen in 
several very recent gene duplications. MAS, MRG, and the 
hyppthetical protein are all located on chromosome 6. In a 
recent publication, six novel genes, SNSRl-6, were pre- 
sented (Lembo et al., 2002). We find that SNSRl-2 are 98% 
identical to MRGX3, SNSR3-4 share 98-99% identity to 
MRGXl, and SNSR5-6 are 98'-99% identical to MRGX4. All 
the SNSRs eire localized on the same chromosomal position 
as the respective MRGX. We have been imable to find the 
reported SNSRs, despite numerous searches in the public 
genome databases as well as in the Celera database. At 
present, we are not certain whether these receptors are iden- 
tical or very similar to the MRGX receptors or if they are 
simply not present in the assembhes of the himian genome, 
eith<ar because of errors or becaiise of missing data. This 



could also be a result of polymorphisms in the different 
libraries used during the screening process: 
MAS, NP_002368.1, 6q25.3; MRGF, AAH16964, llql2.1; MRGX2, 
NP_473371.1, llpl5.1; MRGXl, NP_089843.1, llplS.l; MRGX4, 
NP_473373.1, llplS.l; MRGXS, NP_473372,1, llplS.l; MRGD, 
XP_089955.1, llql2.2; MRG, NP_443199,1, 6p21.1 

The glycoprotein receptor cluster (8X This cluster of recep- 
tors contains the classic glycoprotein hormone receptors 
(FSHR, TSHR, and LHCGR) and the leucine-rich-repeat- 
containing G-protein-coupled receptors (LGRs). The phyloge- 
netic tree clearly indicates the presence of three distinct 
subgroups within this tree: the relaxin binding LGR7-8, the 
orphans LGR4-6, and the glycoprotein hormone receptors. 
The sequence identity within these groups is high (54%, 
37-52%, and 47-50%, respectively), but the sequence iden- 
tity among the groups is low (only 15-22%). The LGR7-8 
subgroup belongs to the paraHOX paralogon (Coulier et al., 
2000) and the LGR4-6 group belongs to the 1/11/12 paral- 
ogon (Fig. 4). LHCGR and FSHR positions are in close prox- 
imity on chromosome 2, 2pl6.3, indicating a possible trans- 
location involving the TSHR gene to chromosome 14: 
LGRS, NP_570718.1, 13ql3.2; LGR7, NP_067647.1, 4q32; 
LGR4(GPR48), NP_060960.1, llpl4.1; LGR6, XP_046692.1, 
lq32.1; LGR5(GPR49), NP_003658.1, 12q22-q23; LHCGR, 
NP_000224.1, 2pl6.3; FSHR, NP_000136.1, 2pl6.3; TSHR, 
NP_000360.1, 14q31.1 

The purin receptor cluster (42). This branch consists of the 
formyl peptide receptors (FPRs), the nucleotide receptors 
(P2Ys), and a large number of orphan GPCRs. The known 
ligands include extracellular nucleotides for the purin recep- 
tors, leukotrienes, and trombins. The nucleotide-binding and 
related receptors have the most difi&ise topology within this 
group. These receptors contain the nucleotide binding recep- 
tors (P2Ys), the formyl peptide binding receptors (FPRs), the 
thrombin receptors (F2Rs), the cysteinyl leukotriene recep- 
tors (CYSLTs), and orphan GPCRs. A proportion of this dis- 
persed receptor group, (i.e., 19 of 38 of these receptors) be- 
longs to the same paralogon: 3q/13q/llql4-q25/17p/19q/Xq 
(Fig. 4). The phylogenetic pattern suggests that many local 
gene duplications occurred before the proposed chromosomal 
duplications. This might explain why the phylogenetic rela- 
tionship of these receptors is hard to resolve. This is because 
the receptors would then have appeared during a short pe- 
riod and evolved and diversified over a relatively long period, 
resulting in a diverse group of receptors without a clear 
sub-bremching resolution. Of the remtuning receptors, six are 
located on Iq, five on 14q, three on 5q, and two on 19p, where 
Iq, 5q, and 19p belong to the same paralogous group. The 
sequence identity is in general low (—20%), although several 
pairs of genes have higher mutual identity: 
GPR18, NP_005283.1, 13q32; PTAFR, NP_000943.1, lp36.11; G2A, 
NP_037477.1, 14q32.3; EBI2, NP_004942.1, 13q32.3; 
P2Y11(P2RY11), NP_002557.1, 19pl3.2; GPR92, NP_065133.1, 
12pl3.31; CaAR(C3ARl), NP_004045.1. 12pl3.31; P2Y9(GPR23), 
NP_005287.1, Xq21.31; P2Y5, NPj005768.1. 13ql4.2; FKSG79, 
NP_115942.1, Xq21.1; P2Y10, NP_055314.1, Xq21.1; GPR17, 
NP_005282.1, 2ql4.3; F2RL3, NP_003941.1, 19pl3.11; F2RL2, 
NP_004092.1, 5ql3.1; F2R, NP_001983.1, 5ql3.1; F2RL1, 
NP_005233.1, 5ql3.1; GPR87, NP_076404.1, 3q25.1; GPR105, 
NP_056694.1, 3q25.1; P2Y12, NP_073625.1, 3q25.1; 
FKSG77(GPR86, GPR94), NP_076403.1, 3q25.1; CYSLTl, 
NP_006630.1, Xq21.1; CYSLT2, NP_065110,1, 13ql4.2; 
GPR80(GPR99), XP_062888.1, 13q32.1; GPR91, NP_149039.1, 
3q25.1; P2Y6(P2RY6), NP_004145.1, llql4.1; P2Y1(P2RY1), 
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Fig. 4. The positioning of the GPCRs in paralogon groups in the human genome. Frames indicate the paralogons (PGs) according to Lundin (1993), 
Holland et al. (1994), Katsanis et al. (1996), Sidow (1996), Pebusque et al. (1998), Kasahara (1999), and Holland (1999), further extended in Popovici 
et al. (2001). Red, 2q/12q/17q/7/(3p) [PG 10 (HOX paralogon)]; dark blue, lp/3p/7/22q (PG 11), light blue: 3q/13q/llql4-q25/17p/19q/Xq (PG 6/7); dark 
green, iy5p-q21/6p21-p25/9/15qllHq26/19p (PG 3), Ught green: lp3, 2p, 8q, 6, 16q, 18 and 20q (PG 13/14); orange, 4pl6.3, 5q, 10q21-26, 8pl2-22/ 
2pll.23 (PG 9 [Meta HOX)]; yellow, Ip21.1-pl3.1,lql^44/llp/12/19q (PG 1); purple, 4q/5q/13q/X [PG 8 (ParaHOX)]; brown, 7/16p/17/22q (PG 12); 
black, Iq23-q44/2p22-p25/llql3,1^23.4/14q/15qll-q26/19q/20p (PG 4). 
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NP_002554.1, 3q25.2; P2Y2(P2RY2), NP_002555.1, llql3.1; 
P2Y4(P2RY4), NP_002556.1, Xql3.1; FKSG80(GPR81), 
NP_115943.1, 12q24.31; HM74, NP_006009.1, 12q24.31; GPR35, 
NP_005292.1, 2q37,3; GPR55, NP_005674.1, 2q37; GPR65, 
NP_003599.1, 14q31.3; 0GR1(GPR68), NP_003476.1, 14q31; 
GPR4, NP_005273.1, 19ql3.3; H963, NP_037440.1, 3q25.1; GPR82, 
NP_543007.1, 1; TRHR, NP_003292.1, 8p23; RE2, NP_031395.1, 
Ip36.13-q31.3; GPR103, NT_006337.5, 4q26; RGR, NP_002912.1, 
10q22.3; GPRlOl, NP_473362.1, Xq26.3 

The olfactory receptor cluster (estimated at 460). Our 
searches and manual inspection of the resulting data files, 
looking at each of the genes individually, indicated that there 
are 460 olfactory receptors in the hiiman genome that we 
consider likely to represent unique functional receptors (data 
not shown). Our phylogenetic einalysis indicates that these 
proteins form a stable phylogenetic cluster, without spread- 
ing to other groups of the rhodopsin family or other families 
(data not shown). We do not show phylogenetic analyses of all 
these genes here because further work is needed to carefully 
match each of the sequences with expressed sequence tags, 
do comparative analysis of the NCBI and Celera databases, 
and annotate all these genes. We randomly picked 17 of these 
olfactory receptor sequences, one from each of the 17 main 
branches that formed in our preliminary phylogenetic ansd- 
ysis; This provided us with a diverse olfactory receptor data 
set that we used in the overall rhodopsin analysis to deter- 
mine the olfactory node that appears in Fig. 3 in the 5-group 
in the rhodopsin family. 

Three hundred forty-seven putative hviman full-length 
odorant receptor genes have previously been identified and 
physically cloned (Zozulya et al., 2001). It has also been 
suggested that there are more than 900 olfactory receptor- 
like sequences in the hum£ui genome (Venter et al., 2001). 
About 60% of these genes are estimated to be pseudogenes. 
Glusman et al. (2001) reported 322 odorant genes and a 
number of pseudogenes in the human genome. They also 
estimate that there were more than 900 olfactory receptor- 
like genes in the genome. The same nvunber of 322 odorant 
genes was also reported by Takeda et al. (2002). The large 
clusters of olfactory receptors are fotmd in paralogous regions 
distributed on 13 himian chromosomes, further supporting 
the general observation that the human olfactory receptors 
share a common origin. Moreover, it is worth mentioning 
that the hxmian olfactory receptors show low or little resem- 
blance to chemosensoiy receptors in nematodes (Robertson 
1998) or the fruit fly (Mombaerts, 1999). 

Other 7TM Receptors (23) 

Some of the 7TM genes could not be included in any family/ 
group/cluster with appreciable bootstrap values. We have 
therefore chosen to present these receptors in this section as 
other 7TM receptors, edthough they clearly do not belong to 
the same group. The ligand for most of these receptors is not 
yet known. The instabihty in the topology is related to cer- 
tain atypical parts of their sequences that could be a resxilt of 
a chimeric origin of the receptors or of evolutionary pressure 
not shared by their closest phylogenetic neighbors. Most of 
these receptors give stable topology if they are analyzed with 
a limited number of sequences (for example, the 5-20 closest 
BLAST hits), but when analyzed in such a large and diverse 
data set, the atypical peirts are more likely to cause an 
unstable topology. It is not uncommon in phylogenetic anal- 



ysis to delete atypical parts from the proteins to avoid such 
"problems". We did not, however, perform any such manipu- 
lation to avoid unbiased handling of the data set. The atyp- 
ical parts of the proteins are often foimd in the loops rather 
than the TM regions. An example of this is the histamine 
HRHl and HRH3 receptors, which have a large third intra- 
cellular loop of about 170 amino acids, which is significantly 
longer than in most other rhodopsin family receptors of the 
a-group (where they obviously belong). When we analyze the 
amine receptor cluster alone, HRHl and HRH3 show stable 
topology; in our large data set, however, they do not, which 
explains why they have ended up in this section. We also 
want to mention that at least 53 VI vomeronasal receptor 
genes have been reported to be in the human genome (Lane 
et al., 2002). We approached Dr. Barbara Trask (Columbia 
University, NY), and she kindly provided \is with a file with 
these 53 genes, which all look like pseudogenes except one 
(VlRLl). VlRLl is found here because it does not show clear 
phylogenetic relationship to any of the main families. Leine et 
al. (2002) reported that there were three clusters of these 
genes found on HSAl, HSA7 and HSA19: 
GPRC5B, NP_071319, 17q25; GPRC5C, NP_016235.1, 16pl2; 
GPRC5D, NP_061124.1; GPR, NP_009154.1, 15ql3.3; GPR14, 
NP_061822.1, 17q25.3; GPR19, NP_006134.1, 12pl2.3; GPR20, 
NP_005284.1, 8q24.2-q24.3; GPR22, NP_005286.1, 7q22-q31.1; 
CMKRL2(GPR30), NP_001496.1, 7p22; GPR31: NP_005290.1, 
6q27; GPR34, NP_005291.1, Xpll.4-pll,3; GPR40, NP_005294.1, 
19ql3.12; GPR41(GPR42), NP_005295.1, 19ql3.12; GPR43, 
NP_005297.1, 19ql3.12; GPR39, NP_001499.1, 2q21-q22; GPR63, 
NP_110411.1, 6ql6.1-ql6.3; GPR75, NP_006785,1, 2pl6; GPR84, 
NP_065103.1, 12ql3.13; HRHl, NP_000852.1, 3p25; HRH3, 
NP_009163.1, 20ql3.33; SR£B2(GPR85), NP_061843.1, 7q31; 
VLGRl, XP_057299, 5ql3; VlRLl, NP_065684, 19ql3.43 

Discussion 

This is the first phylogenetic study of the entire superfam- 
ily of GPCRs in a single mammalian genome. The analyses 
show with high bootstrap support that there are five main 
families of hiunan GPCRs (Fig. 2). Each of the receptors that 
we placed in the five families shows appreciable bootstrap 
value in support of a phylogenetic relationship to the respec- 
tive family. The results indicate that the members within 
each family share a common evolutionary origin. We have 
given the families the following names: glutamate, rhodop- 
sin, adhesion, fin2zled/taste2, and secretin, and we refer to 
them as the GRAFS famihes or the GRAFS classification, 
based on the initials of the family names. The rhodopsin 
receptors make up the largest family, and we show four main 
groups (Fig. 3) with 13 distinct branches. We chose not to 
subdivide the other families. 

Three of the families, the rhodopsin (A), secretin (B), and 
glutamate (C) families, correspond to the A-F clan system 
(Attwood and Findlay, 1994; Kolakowski, 1994), whereas the 
two other families, adhesion and fiizzled, are not included in 
the clan system. We did not find receptors in the human 
genome that belong to families that correspond to clans D, E, 
F, or O. All the receptors, except 23, were designated as 
members of one of the GRAFS families. We found 342 func- 
tional nonolfactory GPCRs in our searches of the human 
database. Combining this number with the preliminary nimi- 
ber of olfactory receptors we identified (460), the total nxmi- 
ber of functional GPCRs in the human genome is more than 
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800. Our analysis covers thus about 2% of the genes in the 
human genome. We are not aware that simultaneous phylo- 
genetic analysis has previously been performed on such a 
large and complex data set from a single genome. It may 
seem to be a daimting task to analyze the remaining 98% of 
the human genome covering the other protein famihes. We 
believe, however, that our "manual" approach, inspecting 
sequence for sequence, group to group, is important to pro- 
vide clarity into ntunbers and phylogenetic topology of the 
proteins in the genome. We believe that our results will be 
veduable for analyzing the mouse, rat, chicken, fugu, and 
zebrafish genomes to determine the orthologous relationship 
of the GPCRs in these other genomes, which Eu-e already 
available or are soon to be completed. 

The phylogenetic relationship of the secretin and secretin- 
like receptors (a term widely used in connection to a variety 
of receptors) in the hiunan genome has been unclear. Our 
analysis shows one distinct family of receptors whose ligands 
are rather large peptides that mainly act in a paracrine 
manner; we term these the secretin family. However, we also 
show that there exists smother distinct family of receptors 
that we name, for the first time, the adhesion family. Many of 
these receptors have very long N termini and most of them 
have adhesion molecule repeats that are likely to participate 
in cell-to-cell interactions. Previously, it had been suggested 
that the metabotropic glutamate receptors belong to the 
same family as the calcium and GABA receptors (Bockaert 
and Pin, 1999). Our analysis confirms this and shows that 
the* two GABA receptors branch basally in the glutamate 
family. A few recently found taste receptors (TASl) also 
group into the glutamate family. The fifth family is made up 
of the firizzled receptors and a nvunber of taste receptors 
(TAS2). It is important to note that the taste receptors in 
groups TASl and TAS2 do not show any phylogenetic rela- 
tionship; to add to the confiision, some olfactory receptors 
have TAS names (probably given by mistake, to our best 
knowledge). 

It has often been stated that the different GPCR famihes 
show no structural similarities. Bockaert and Pin (1999) 
wrote that "There are at least six families of GPCRs showing 
no sequence similarity". In fact, several 7TM receptors (for 
example, bacterial rhodopsin, several chemosensory recep- 
tors in C elegans, and olfactory receptors in D melanogaster 
show very low or no similarities to any GPCR in the humem 
genome (Robertson, 1998; Mombaerts, 1999). Repeated 
BLAST searches on GPCRs from various species have im- 
pKed that three overall classes of GPCRs may exist (Josefs- 
son, 1999). A recent study analyzing GPCRs from a number 
of highly divergent species showed 34 distinct clusters with 
significant alignment between distemtly related clusters 
(Graul and Sadee, 2001). It is important to note that our 
phylqgenetic emalysis does not reveal clear evidence of a 
common descent of the GRAFS families. However, visual 
inspections of the alignments disclose features that are 
shared within the families beyond the featxire of seven hy- 
drophobic regions. All the families have a conserved Cys 
between TMI and TMII and another conserved Cys between 
TMIII and TMIV. These residues are beheved to create a 
disulfide bridge between these loops and to be important for 
the structural integrity of the protein. The conservation of 
these two single amino acids does obviously not have an 
impact in the phylogenetic emfdysis. This is because of the 



distance between them, the variability in the length of the 
receptors, and because these bridges do not seem to need 
defined structural surroundings, probably because they are 
found in the flexible extracellular loops. It should be noted 
that the actual physical presence of these bridges has not 
been shown for all the different famihes, although it is very 
well established that these are fimctionally crucial for, sev- 
eral receptors within the rhodopsin family. 

To further analyze the putative similarities between the 
families, we extended our analysis by generating HMMs for 
each family. The families may share several regions that are 
well conserved between the groups that are not evident by 
looking at the alignment alone. We subsequently tried to 
align the TM regions of the HMMs. Several motifs shared by 
some families emerged, as exempUfied by the alignments 
shown in Fig. 5. All the proteins in each family (except the 
olfactory cluster in rhodopsin) contribute to these HMM con- 
sensus sequences in Fig. 5. We fpund it remarkable that all 
the consensus sequences derived from the GRAFS families 
aligned, without generating long or repeated gaps, with their 
respective TM regions, with only a few exceptions (the glu- 
tamate and fiizzled families did not align in TMIII and 
TMVII). The TM consensus sequences could not be aUgned to 
a "wrong" TM region, meaning, for example, that any con- 
sensus sequence from TMI could not be aligned with the 
consensus sequences from TMII, TMIII, TMIV, TMV, TMVI, 
or TMVII (data not shown). The consensus alignment created 
"consensus residues", marked by dark shading in Fig. 5. 
None of these consensus residues is conserved through all 
five famihes, but six of them were foimd in four families. 
Moreover, the nonidentical residue in the same position as 
these six consensus residues is also a hydrophobic residue in 
all cases except one. Furthermore, in three cases, the fifth 
residue is a valine that is closely related structxu*ally to the 
consensus residue leucine. The boundaries of the TM regions 
are defined by hydrophobicity plots (see the Introduction), 
and it is thus no surprise that the ahgnable residues are 
hydrophobic. This could indicate, however, that the sequence 
similarity may be caused by fimctional constrains related to 
the a-helical structure that passes the lipophihc membrane 
rather than common descent. It should be noted, however, 
that the hydrophobicity varies notably from one a-helix to 
another, and none of the sequencie similarities is repeated in 
more than one helix. Visual inspection shows that the nnm- 
bers of identical residues clearly differs from one helix to 
another, indicating a nonremdom pattern. Different hydro- 
phobicity patterns from one helix to another could be attrib- 
uted to different positioning in the seven hehcal clusters that 
makes up the receptor, enabling signal transduction through 
the membrane to the Gr-proteins. Considering the crystal 
structure of bovine rhodopsin, the TMIII for example is ori- 
ented in the middle of the TM cluster, whereas TMIV and 
TMV are more exposed to the membrane (Baldwin, 1994; 
Palczewski et al., 2000). Whether the clustering of these 
hydrophobic residues is related to common TM orientation or 
to other important structural features, we are inclined to 
beUeve that they add support for a possible common descent 
of the GRAFS famihes. We also find it intriguing that al- 
though none of the repeated residue motife are clearly shared 
by all the five famihes, they all can be connected through 
motifs in two or more famihes. In TMII, the glutamate and 
the fiizzled families ahgn in a seven-residue consensus se- 
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quence in which five residues are identical and the difference 
lies in Vad and Leu in one position and two polar residues, 
Thr and Arg, in the other nonidentical position. The adhesion 
and secretin families share several short motifs in TMI, 
TMII, TMVI, TMVII, and also an ll-amino acid motif in 
TMV, where eight residues are identical. The adhesion and 
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Fig. 5. Alignment of the consensus sequences of the region around the 
TM-regions for the five families of human GPCRs. The consensus se- 
quences are statistically derived using HMMalign as described under 
Materials and Methods. Black boxes indicates that the residue is con- 
served between four of the five families, dark gray indicates conservation 
between three families, and light gray denotes conservation between two 
families. 



secretin families link to the frizzled family in TMIV with a 
G/AWG/AXPAIW, where X is always hydrophobic; it should 
be noted that P and W are rather imusual residues in a-he- 
lixes. The rhodopsin family has no long sequence motif that 
links it clearly to any of the other families. However, two 
three-amino acid motifs are fovmd in TMIV and TMVI that 
link the rhodopsin family to the glutamate family and adhe- 
sion families, respectively. Moreover, all six positions that 
have four identical residues include the rhodopsin family; for 
example, the Trp that is a part of the strong motif in TMIV 
links the adhesion, frizzled, and secretin families. Thiis, the 
results indicate that primary sequences are shared within 
the families. The HMM approach applied here and the sub- 
sequent alignment is also more sensitive than using simple 
sequence alignments; further application of such methods 
could be the key to identifying more conserved motifs be- 
tween the groups. Considering the direct sequence similari- 
ties mentioned above, together with the putative conserved 
Cys bridge in all families and the TM region-dependent 
alignment pattern displayed in Fig. 5, we suggest thus that 
there is confoimding evidence that the himian GPCRs that 
we assigned to the GRAFS families share a conmion ances- 
tor. 

We created a chart showing how the GPCRs are foimd in 
different paralogy groups (See Fig. 4). This figure shows how 
severed of the GPCRs are located in paralogous regions on the 
chromosomes. When these groups are studied together with 
the phylogenetic trees, it demonstrates how a lai^e ntunber 
of these receptor genes are likely to have been formed 
through tetraploidizations, whereisis others are more likely to 
have arisen through local gene duplications. Another piece of 
information that is obtained from the paralogons is the pu- 
tative mechanism for how the different gene subfamilies in 
the adhesion family have been composed from different do- 
mains. All of the genes in the adhesion family, of coiirse, 
contain the code for the seven TM regions; apart from this, 
many of them also have distinct elements in the N termini 
that can be recognized in various other gene families. We 
predicted that it might be possible to trace some of the miajor 
evolutionary events of putative domain shuffling. We com- 
pared the chromosomal locations of these adhesion family 
genes with the chromosomal locations of the genes that 
might be supposed to carry the parental domedns in question. 
The three BAI genes are located in the group of paralogous 
chromosomal regions, lp3/2p/8q/20q, originally described by 
Spring et al. (1994) and later extended to contain parts of 6p, 
6q, 16q, and 18. Two of the LEC genes, the EMR genes, and 
CD97, as well as ETL, GPR56 (TMVIIXNl), and one of the 
CELSR genes, belong to the paralogon lp-q2/6p/9/19p (Kat- 
sanis et al., 1996), later extended to include parts of 5p-q2 
and 15q. These two paralogy groups have two himian chro- 
mosomal regions in conamon, lp3 and 6p2, which may give an 
indication that the ancestral regions of these groups might be 
have syntenic or arisen from a conmion region at an earher 
stage of vertebrate evolution (Lundin et al., 2002). Further- 
more, they share the lp3 region with a third paralogon, 
lp3/3q/7q/12p/17p. It was suggested that parental genes, of 
the ones foimd in the 10 main regions included in these 
paralogy groups plus the likely translocated regions, once 
could have been syntenic in an early prevertebrate. Accord- 
ing to this scenario, this ancestral region duplicated twice as 
a result of the postulated genome doublings, and these four 
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newly formed regions must then successively have split up 
into a larger number of regions, except the one in chromo- 
some lp3. It is thus interesting to see that most of the genes 
that are likely to have contributed to the several different 
domains seen in genes from the adhesion family are also 
present in these three paralogy groups. Four of the subfam- 
ilies (BAI, CD97, EMR, and LEG) contain a mucin domain. 
Mucin genes are located at lq22, 3q21.2, 3q29, 6q21, 7q22, 
and 19pl3.2. The LEG subfamily carries £m olfactomedin 
domain, and the two olfactomedin genes mapped in the hu- 
man genome are located on 9q34.3 and 19pl3.2. Grenes of the 
BAI subfamily have several thrombospondin domains, and 
the three human thromobospondin genes are mapped at 
lq21, 6q27, and 15ql5. The GELSR genes carry cadherin 
domains, and no less than 16 cadherin genes are located at 
5pl4-13, 8q22, 16q21-24, 18q, and 20ql3. Furthermore, the 
GELSR genes contain two laminin A domains, and laminin A 
genes have been mapped to 6q21-22 (2), 18pll, ISqll, and 
20ql3. Genes from three of the subfamilies, CD97, EMR, and 
GELSR, also carry EGF-like domains, and two of the human 
EGFL genes are foimd at lp36.3 and 9q32-33. It does seem 
likely that all the genes mentioned in this connection were 
linked in the same chromosomal region in an early metazoan 
and that unequal crossing-over between parental genes in 
this region caused exon shuffling, leading to the structures 
found in extant genes of the adhesion family. 

In summary, we have generated the first map for one of the 
most studied superfamily of proteins foimd in the human 
genome. We demonstrated the existence of five distinct fam- 
ilies of GPCRs, and we determined the relationship of the 
genes within subgroups of the large rhodopsin family. This 
map will be very useful for comparison of GPGRs in other 
species and will subsequently enhance our understanding of 
how structural and functional properties evolved. The paral- 
ogon analysis presents further evidence for common descent 
of the phylogenetic clusters and exemplifies how exon shuf- 
fling may have played a role in composition of some of the 
receptor genes. Because of the diversity of structiwal ele- 
ments found in this family, it is likely that the examples of 
evolutionary mechanisms that are predicted here may have a 
general importance for several other protein families, typi- 
cally those that share a-helical domains and TM regions that 
are combined with other fiinctional elements. 
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